I'm looking for the implementation of MemoryStream which does not allocate memory as one big block, but rather a collection of chunks. I want to store a few GB of data in memory (64 bit) and avoid limitation of memory fragmentation.
Something like this:
class ChunkedMemoryStream : Stream
{
private readonly List<byte[]> _chunks = new List<byte[]>();
private int _positionChunk;
private int _positionOffset;
private long _position;
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return true; }
}
public override bool CanWrite
{
get { return true; }
}
public override void Flush() { }
public override long Length
{
get { return _chunks.Sum(c => c.Length); }
}
public override long Position
{
get
{
return _position;
}
set
{
_position = value;
_positionChunk = 0;
while (_positionOffset != 0)
{
if (_positionChunk >= _chunks.Count)
throw new OverflowException();
if (_positionOffset < _chunks[_positionChunk].Length)
return;
_positionOffset -= _chunks[_positionChunk].Length;
_positionChunk++;
}
}
}
public override int Read(byte[] buffer, int offset, int count)
{
int result = 0;
while ((count != 0) && (_positionChunk != _chunks.Count))
{
int fromChunk = Math.Min(count, _chunks[_positionChunk].Length - _positionOffset);
if (fromChunk != 0)
{
Array.Copy(_chunks[_positionChunk], _positionOffset, buffer, offset, fromChunk);
offset += fromChunk;
count -= fromChunk;
result += fromChunk;
_position += fromChunk;
}
_positionOffset = 0;
_positionChunk++;
}
return result;
}
public override long Seek(long offset, SeekOrigin origin)
{
long newPos = 0;
switch (origin)
{
case SeekOrigin.Begin:
newPos = offset;
break;
case SeekOrigin.Current:
newPos = Position + offset;
break;
case SeekOrigin.End:
newPos = Length - offset;
break;
}
Position = Math.Max(0, Math.Min(newPos, Length));
return newPos;
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override void Write(byte[] buffer, int offset, int count)
{
while ((count != 0) && (_positionChunk != _chunks.Count))
{
int toChunk = Math.Min(count, _chunks[_positionChunk].Length - _positionOffset);
if (toChunk != 0)
{
Array.Copy(buffer, offset, _chunks[_positionChunk], _positionOffset, toChunk);
offset += toChunk;
count -= toChunk;
_position += toChunk;
}
_positionOffset = 0;
_positionChunk++;
}
if (count != 0)
{
byte[] chunk = new byte[count];
Array.Copy(buffer, offset, chunk, 0, count);
_chunks.Add(chunk);
_positionChunk = _chunks.Count;
_position += count;
}
}
}
class Program
{
static void Main(string[] args)
{
ChunkedMemoryStream cms = new ChunkedMemoryStream();
Debug.Assert(cms.Length == 0);
Debug.Assert(cms.Position == 0);
cms.Position = 0;
byte[] helloworld = Encoding.UTF8.GetBytes("hello world");
cms.Write(helloworld, 0, 3);
cms.Write(helloworld, 3, 3);
cms.Write(helloworld, 6, 5);
Debug.Assert(cms.Length == 11);
Debug.Assert(cms.Position == 11);
cms.Position = 0;
byte[] b = new byte[20];
cms.Read(b, 3, (int)cms.Length);
Debug.Assert(b.Skip(3).Take(11).SequenceEqual(helloworld));
cms.Position = 0;
cms.Write(Encoding.UTF8.GetBytes("seeya"), 0, 5);
Debug.Assert(cms.Length == 11);
Debug.Assert(cms.Position == 5);
cms.Position = 0;
cms.Read(b, 0, (byte) cms.Length);
Debug.Assert(b.Take(11).SequenceEqual(Encoding.UTF8.GetBytes("seeya world")));
Debug.Assert(cms.Length == 11);
Debug.Assert(cms.Position == 11);
cms.Write(Encoding.UTF8.GetBytes(" again"), 0, 6);
Debug.Assert(cms.Length == 17);
Debug.Assert(cms.Position == 17);
cms.Position = 0;
cms.Read(b, 0, (byte)cms.Length);
Debug.Assert(b.Take(17).SequenceEqual(Encoding.UTF8.GetBytes("seeya world again")));
}
}
You need to first determine if virtual address fragmentation is the problem.
If you are on a 64 bit machine (which you seem to indicate you are) I seriously doubt it is. Each 64 bit process has almost the the entire 64 bit virtual memory space available and your only worry is virtual address space fragmentation not physical memory fragmentation (which is what the operating system must worry about). The OS memory manager already pages memory under the covers. For the forseeable future you will not run out of virtual address space before you run out of physical memory. This is unlikely change before we both retire.
If you are have a 32 bit address space, then allocating contiguous large blocks of memory in the GB ramge you will encounter a fragmentation problem quite quickly. There is no stock chunk allocating memory stream in the CLR. There is one in the under the covers in ASP.NET (for other reasons) but it is not accessable. If you must travel this path you are probably better off writing one youself anyway because the usage pattern of your application is unlikely to be similar to many others and trying to fit your data into a 32bit address space will likely be your perf bottleneck.
I highly recommend requiring a 64 bit process if you are manipulating GBs of data. It will do a much better job than hand-rolled solutions to 32 bit address space fragmentation regardless of how cleaver you are.
The Bing team has released RecyclableMemoryStream and wrote about it here. The benefits they cite are:
Eliminate Large Object Heap allocations by using pooled buffers
Incur far fewer gen 2 GCs, and spend far less time paused due to GC
Avoid memory leaks by having a bounded pool size
Avoid memory fragmentation
Provide excellent debuggability
Provide metrics for performance tracking
I've found similar problem in my application. I've read large amount of compressed data and I suffered from OutOfMemoryException using MemoryStream. I've written my own implementation of "chunked" memory stream based on collection of byte arrays. If you have any idea how to make this memory stream more effective, please write me about it.
public sealed class ChunkedMemoryStream : Stream
{
#region Constants
private const int BUFFER_LENGTH = 65536;
private const byte ONE = 1;
private const byte ZERO = 0;
#endregion
#region Readonly & Static Fields
private readonly Collection<byte[]> _chunks;
#endregion
#region Fields
private long _length;
private long _position;
private const byte TWO = 2;
#endregion
#region C'tors
public ChunkedMemoryStream()
{
_chunks = new Collection<byte[]> { new byte[BUFFER_LENGTH], new byte[BUFFER_LENGTH] };
_position = ZERO;
_length = ZERO;
}
#endregion
#region Instance Properties
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return true; }
}
public override bool CanWrite
{
get { return true; }
}
public override long Length
{
get { return _length; }
}
public override long Position
{
get { return _position; }
set
{
if (!CanSeek)
throw new NotSupportedException();
_position = value;
if (_position > _length)
_position = _length - ONE;
}
}
private byte[] CurrentChunk
{
get
{
long positionDividedByBufferLength = _position / BUFFER_LENGTH;
var chunkIndex = Convert.ToInt32(positionDividedByBufferLength);
byte[] chunk = _chunks[chunkIndex];
return chunk;
}
}
private int PositionInChunk
{
get
{
int positionInChunk = Convert.ToInt32(_position % BUFFER_LENGTH);
return positionInChunk;
}
}
private int RemainingBytesInCurrentChunk
{
get
{
Contract.Ensures(Contract.Result<int>() > ZERO);
int remainingBytesInCurrentChunk = CurrentChunk.Length - PositionInChunk;
return remainingBytesInCurrentChunk;
}
}
#endregion
#region Instance Methods
public override void Flush()
{
}
public override int Read(byte[] buffer, int offset, int count)
{
if (offset + count > buffer.Length)
throw new ArgumentException();
if (buffer == null)
throw new ArgumentNullException();
if (offset < ZERO || count < ZERO)
throw new ArgumentOutOfRangeException();
if (!CanRead)
throw new NotSupportedException();
int bytesToRead = count;
if (_length - _position < bytesToRead)
bytesToRead = Convert.ToInt32(_length - _position);
int bytesreaded = 0;
while (bytesToRead > ZERO)
{
// get remaining bytes in current chunk
// read bytes in current chunk
// advance to next position
int remainingBytesInCurrentChunk = RemainingBytesInCurrentChunk;
if (remainingBytesInCurrentChunk > bytesToRead)
remainingBytesInCurrentChunk = bytesToRead;
Array.Copy(CurrentChunk, PositionInChunk, buffer, offset, remainingBytesInCurrentChunk);
//move position in source
_position += remainingBytesInCurrentChunk;
//move position in target
offset += remainingBytesInCurrentChunk;
//bytesToRead is smaller
bytesToRead -= remainingBytesInCurrentChunk;
//count readed bytes;
bytesreaded += remainingBytesInCurrentChunk;
}
return bytesreaded;
}
public override long Seek(long offset, SeekOrigin origin)
{
switch (origin)
{
case SeekOrigin.Begin:
Position = offset;
break;
case SeekOrigin.Current:
Position += offset;
break;
case SeekOrigin.End:
Position = Length + offset;
break;
}
return Position;
}
private long Capacity
{
get
{
int numberOfChunks = _chunks.Count;
long capacity = numberOfChunks * BUFFER_LENGTH;
return capacity;
}
}
public override void SetLength(long value)
{
if (value > _length)
{
while (value > Capacity)
{
var item = new byte[BUFFER_LENGTH];
_chunks.Add(item);
}
}
else if (value < _length)
{
var decimalValue = Convert.ToDecimal(value);
var valueToBeCompared = decimalValue % BUFFER_LENGTH == ZERO ? Capacity : Capacity - BUFFER_LENGTH;
//remove data chunks, but leave at least two chunks
while (value < valueToBeCompared && _chunks.Count > TWO)
{
byte[] lastChunk = _chunks.Last();
_chunks.Remove(lastChunk);
}
}
_length = value;
if (_position > _length - ONE)
_position = _length == 0 ? ZERO : _length - ONE;
}
public override void Write(byte[] buffer, int offset, int count)
{
if (!CanWrite)
throw new NotSupportedException();
int bytesToWrite = count;
while (bytesToWrite > ZERO)
{
//get remaining space in current chunk
int remainingBytesInCurrentChunk = RemainingBytesInCurrentChunk;
//if count of bytes to be written is fewer than remaining
if (remainingBytesInCurrentChunk > bytesToWrite)
remainingBytesInCurrentChunk = bytesToWrite;
//if remaining bytes is still greater than zero
if (remainingBytesInCurrentChunk > ZERO)
{
//write remaining bytes to current Chunk
Array.Copy(buffer, offset, CurrentChunk, PositionInChunk, remainingBytesInCurrentChunk);
//change offset of source array
offset += remainingBytesInCurrentChunk;
//change bytes to write
bytesToWrite -= remainingBytesInCurrentChunk;
//change length and position
_length += remainingBytesInCurrentChunk;
_position += remainingBytesInCurrentChunk;
}
if (Capacity == _position)
_chunks.Add(new byte[BUFFER_LENGTH]);
}
}
/// <summary>
/// Gets entire content of stream regardless of Position value and return output as byte array
/// </summary>
/// <returns>byte array</returns>
public byte[] ToArray()
{
var outputArray = new byte[Length];
if (outputArray.Length != ZERO)
{
long outputPosition = ZERO;
foreach (byte[] chunk in _chunks)
{
var remainingLength = (Length - outputPosition) > chunk.Length
? chunk.Length
: Length - outputPosition;
Array.Copy(chunk, ZERO, outputArray, outputPosition, remainingLength);
outputPosition = outputPosition + remainingLength;
}
}
return outputArray;
}
/// <summary>
/// Method set Position to first element and write entire stream to another
/// </summary>
/// <param name="stream">Target stream</param>
public void WriteTo(Stream stream)
{
Contract.Requires(stream != null);
Position = ZERO;
var buffer = new byte[BUFFER_LENGTH];
int bytesReaded;
do
{
bytesReaded = Read(buffer, ZERO, BUFFER_LENGTH);
stream.Write(buffer, ZERO, bytesReaded);
} while (bytesReaded > ZERO);
}
#endregion
}
Here is a full implementation:
/// <summary>
/// Defines a MemoryStream that does not sit on the Large Object Heap, thus avoiding memory fragmentation.
/// </summary>
/// <seealso cref="Stream" />
public sealed class ChunkedMemoryStream : Stream
{
/// <summary>
/// Defines the default chunk size. Currently defined as 0x10000.
/// </summary>
public const int DefaultChunkSize = 0x10000; // needs to be < 85000
private const int _lohSize = 85000;
private List<byte[]> _chunks = new List<byte[]>();
private long _position;
private int _chunkSize;
private int _lastChunkPos;
private int _lastChunkPosIndex;
/// <summary>
/// Initializes a new instance of the <see cref="ChunkedMemoryStream" /> class based on the specified byte array.
/// </summary>
/// <param name="chunkSize">Size of the underlying chunks.</param>
/// <param name="buffer">The array of unsigned bytes from which to create the current stream.</param>
public ChunkedMemoryStream(int chunkSize = DefaultChunkSize, byte[] buffer = null)
{
FreeOnDispose = true;
ChunkSize = chunkSize;
_chunks.Add(new byte[chunkSize]);
if (buffer != null)
{
Write(buffer, 0, buffer.Length);
Position = 0;
}
}
/// <summary>
/// Gets or sets a value indicating whether to free the underlying chunks on dispose.
/// </summary>
/// <value>
/// <c>true</c> if the underlying chunks must be freed on disposal; otherwise, <c>false</c>.
/// </value>
public bool FreeOnDispose { get; set; }
/// <summary>
/// Releases the unmanaged resources used by the <see cref="Stream" /> and optionally releases the managed resources.
/// </summary>
/// <param name="disposing">true to release both managed and unmanaged resources; false to release only unmanaged resources.</param>
protected override void Dispose(bool disposing)
{
if (FreeOnDispose)
{
if (_chunks != null)
{
_chunks = null;
_chunkSize = 0;
_position = 0;
}
}
base.Dispose(disposing);
}
/// <summary>
/// When overridden in a derived class, clears all buffers for this stream and causes any buffered data to be written to the underlying device.
/// This implementation does nothing.
/// </summary>
public override void Flush()
{
// do nothing
}
/// <summary>
/// When overridden in a derived class, reads a sequence of bytes from the current stream and advances the position within the stream by the number of bytes read.
/// </summary>
/// <param name="buffer">An array of bytes. When this method returns, the buffer contains the specified byte array with the values between <paramref name="offset" /> and (<paramref name="offset" /> + <paramref name="count" /> - 1) replaced by the bytes read from the current source.</param>
/// <param name="offset">The zero-based byte offset in <paramref name="buffer" /> at which to begin storing the data read from the current stream.</param>
/// <param name="count">The maximum number of bytes to be read from the current stream.</param>
/// <returns>
/// The total number of bytes read into the buffer. This can be less than the number of bytes requested if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.
/// </returns>
/// <exception cref="ArgumentNullException"><paramref name="buffer" /> is null.</exception>
/// <exception cref="ArgumentOutOfRangeException"><paramref name="offset" /> or <paramref name="count" /> is negative.</exception>
/// <exception cref="ArgumentException">The sum of <paramref name="offset" /> and <paramref name="count" /> is larger than the buffer length.</exception>
/// <exception cref="ObjectDisposedException">Methods were called after the stream was closed.</exception>
public override int Read(byte[] buffer, int offset, int count)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer));
if (offset < 0)
throw new ArgumentOutOfRangeException(nameof(offset));
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count));
if ((buffer.Length - offset) < count)
throw new ArgumentException(null, nameof(count));
CheckDisposed();
var chunkIndex = (int)(_position / ChunkSize);
if (chunkIndex == _chunks.Count)
return 0;
var chunkPos = (int)(_position % ChunkSize);
count = (int)Math.Min(count, Length - _position);
if (count == 0)
return 0;
var left = count;
var inOffset = offset;
var total = 0;
do
{
var toCopy = Math.Min(left, ChunkSize - chunkPos);
Buffer.BlockCopy(_chunks[chunkIndex], chunkPos, buffer, inOffset, toCopy);
inOffset += toCopy;
left -= toCopy;
total += toCopy;
if ((chunkPos + toCopy) == ChunkSize)
{
if (chunkIndex == (_chunks.Count - 1))
{
// last chunk
break;
}
chunkPos = 0;
chunkIndex++;
}
else
{
chunkPos += toCopy;
}
}
while (left > 0);
_position += total;
return total;
}
/// <summary>
/// Reads a byte from the stream and advances the position within the stream by one byte, or returns -1 if at the end of the stream.
/// </summary>
/// <returns>
/// The unsigned byte cast to an Int32, or -1 if at the end of the stream.
/// </returns>
/// <exception cref="ObjectDisposedException">Methods were called after the stream was closed.</exception>
public override int ReadByte()
{
CheckDisposed();
if (_position >= Length)
return -1;
var ret = _chunks[(int)(_position / ChunkSize)][_position % ChunkSize];
_position++;
return ret;
}
/// <summary>
/// When overridden in a derived class, sets the position within the current stream.
/// </summary>
/// <param name="offset">A byte offset relative to the <paramref name="origin" /> parameter.</param>
/// <param name="origin">A value of type <see cref="SeekOrigin" /> indicating the reference point used to obtain the new position.</param>
/// <returns>The new position within the current stream.</returns>
/// <exception cref="ObjectDisposedException">Methods were called after the stream was closed.</exception>
public override long Seek(long offset, SeekOrigin origin)
{
CheckDisposed();
switch (origin)
{
case SeekOrigin.Begin:
Position = offset;
break;
case SeekOrigin.Current:
Position += offset;
break;
case SeekOrigin.End:
Position = Length + offset;
break;
}
return Position;
}
private void CheckDisposed()
{
if (_chunks == null)
throw new ObjectDisposedException(null, "Cannot access a disposed stream.");
}
/// <summary>
/// When overridden in a derived class, sets the length of the current stream.
/// </summary>
/// <param name="value">The desired length of the <paramref name="value" /> stream in bytes.</param>
/// <exception cref="ArgumentOutOfRangeException"><paramref name="value" /> is out of range.</exception>
/// <exception cref="ObjectDisposedException">Methods were called after the stream was closed.</exception>
public override void SetLength(long value)
{
CheckDisposed();
if (value < 0)
throw new ArgumentOutOfRangeException(nameof(value));
if (value > Length)
throw new ArgumentOutOfRangeException(nameof(value));
var needed = value / ChunkSize;
if ((value % ChunkSize) != 0)
{
needed++;
}
if (needed > int.MaxValue)
throw new ArgumentOutOfRangeException(nameof(value));
if (needed < _chunks.Count)
{
var remove = (int)(_chunks.Count - needed);
for (var i = 0; i < remove; i++)
{
_chunks.RemoveAt(_chunks.Count - 1);
}
}
_lastChunkPos = (int)(value % ChunkSize);
}
/// <summary>
/// Converts the current stream to a byte array.
/// </summary>
/// <returns>
/// An array of bytes
/// </returns>
public byte[] ToArray()
{
CheckDisposed();
var bytes = new byte[Length];
var offset = 0;
for (var i = 0; i < _chunks.Count; i++)
{
var count = (i == (_chunks.Count - 1)) ? _lastChunkPos : _chunks[i].Length;
if (count > 0)
{
Buffer.BlockCopy(_chunks[i], 0, bytes, offset, count);
offset += count;
}
}
return bytes;
}
/// <summary>
/// When overridden in a derived class, writes a sequence of bytes to the current stream and advances the current position within this stream by the number of bytes written.
/// </summary>
/// <param name="buffer">An array of bytes. This method copies <paramref name="count" /> bytes from <paramref name="buffer" /> to the current stream.</param>
/// <param name="offset">The zero-based byte offset in <paramref name="buffer" /> at which to begin copying bytes to the current stream.</param>
/// <param name="count">The number of bytes to be written to the current stream.</param>
/// <exception cref="ArgumentException">The sum of <paramref name="offset" /> and <paramref name="count" /> is greater than the buffer length.</exception>
/// <exception cref="ArgumentNullException"><paramref name="buffer" /> is null.</exception>
/// <exception cref="ArgumentOutOfRangeException"><paramref name="offset" /> or <paramref name="count" /> is negative.</exception>
/// <exception cref="ObjectDisposedException">Methods were called after the stream was closed.</exception>
public override void Write(byte[] buffer, int offset, int count)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer));
if (offset < 0)
throw new ArgumentOutOfRangeException(nameof(offset));
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count));
if ((buffer.Length - offset) < count)
throw new ArgumentException(null, nameof(count));
CheckDisposed();
var chunkPos = (int)(_position % ChunkSize);
var chunkIndex = (int)(_position / ChunkSize);
if (chunkIndex == _chunks.Count)
{
_chunks.Add(new byte[ChunkSize]);
}
var left = count;
var inOffset = offset;
do
{
var copied = Math.Min(left, ChunkSize - chunkPos);
Buffer.BlockCopy(buffer, inOffset, _chunks[chunkIndex], chunkPos, copied);
inOffset += copied;
left -= copied;
if ((chunkPos + copied) == ChunkSize)
{
chunkIndex++;
chunkPos = 0;
if (chunkIndex == _chunks.Count)
{
_chunks.Add(new byte[ChunkSize]);
}
}
else
{
chunkPos += copied;
}
}
while (left > 0);
_position += count;
if (chunkIndex == (_chunks.Count - 1))
{
if (chunkIndex > _lastChunkPosIndex || (chunkIndex == _lastChunkPosIndex && chunkPos > _lastChunkPos))
{
_lastChunkPos = chunkPos;
_lastChunkPosIndex = chunkIndex;
}
}
}
/// <summary>
/// Writes a byte to the current position in the stream and advances the position within the stream by one byte.
/// </summary>
/// <param name="value">The byte to write to the stream.</param>
/// <exception cref="ObjectDisposedException">Methods were called after the stream was closed.</exception>
public override void WriteByte(byte value)
{
CheckDisposed();
var chunkIndex = (int)(_position / ChunkSize);
var chunkPos = (int)(_position % ChunkSize);
if (chunkPos > (ChunkSize - 1))
{
chunkIndex++;
chunkPos = 0;
if (chunkIndex == _chunks.Count)
{
_chunks.Add(new byte[ChunkSize]);
}
}
_chunks[chunkIndex][chunkPos++] = value;
_position++;
if (chunkIndex == (_chunks.Count - 1))
{
if (chunkIndex > _lastChunkPosIndex || (chunkIndex == _lastChunkPosIndex && chunkPos > _lastChunkPos))
{
_lastChunkPos = chunkPos;
_lastChunkPosIndex = chunkIndex;
}
}
}
/// <summary>
/// Writes to the specified stream.
/// </summary>
/// <param name="stream">The stream.</param>
/// <exception cref="ArgumentNullException"><paramref name="stream" /> is null.</exception>
public void WriteTo(Stream stream)
{
if (stream == null)
throw new ArgumentNullException(nameof(stream));
CheckDisposed();
for (var i = 0; i < _chunks.Count; i++)
{
var count = i == (_chunks.Count - 1) ? _lastChunkPos : _chunks[i].Length;
stream.Write(_chunks[i], 0, count);
}
}
/// <summary>
/// When overridden in a derived class, gets a value indicating whether the current stream supports reading.
/// </summary>
public override bool CanRead => true;
/// <summary>
/// When overridden in a derived class, gets a value indicating whether the current stream supports seeking.
/// </summary>
public override bool CanSeek => true;
/// <summary>
/// When overridden in a derived class, gets a value indicating whether the current stream supports writing.
/// </summary>
public override bool CanWrite => true;
/// <summary>
/// When overridden in a derived class, gets the length in bytes of the stream.
/// </summary>
/// <exception cref="ObjectDisposedException">Methods were called after the stream was closed.</exception>
public override long Length
{
get
{
CheckDisposed();
if (_chunks.Count == 0)
return 0;
return (long)(_chunks.Count - 1) * ChunkSize + _lastChunkPos;
}
}
/// <summary>
/// Gets or sets the size of the underlying chunks. Cannot be greater than or equal to 85000.
/// </summary>
/// <value>
/// The chunks size.
/// </value>
/// <exception cref="ArgumentOutOfRangeException"><paramref name="value" /> is out of range.</exception>
public int ChunkSize
{
get => _chunkSize;
set
{
if (value <= 0 || value >= _lohSize)
throw new ArgumentOutOfRangeException(nameof(value));
_chunkSize = value;
}
}
/// <summary>
/// When overridden in a derived class, gets or sets the position within the current stream.
/// </summary>
/// <exception cref="ArgumentOutOfRangeException"><paramref name="value" /> is out of range.</exception>
/// <exception cref="ObjectDisposedException">Methods were called after the stream was closed.</exception>
public override long Position
{
get
{
CheckDisposed();
return _position;
}
set
{
CheckDisposed();
if (value < 0)
throw new ArgumentOutOfRangeException(nameof(value));
if (value > Length)
throw new ArgumentOutOfRangeException(nameof(value));
_position = value;
}
}
}
You should use the UnmanagedMemoryStream when dealing with over 2GB chunks of memory, as MemoryStream is limited to 2GB, and the UnmanagedMemoryStream was made to deal with this problem.
SparseMemoryStream does this in .NET it's in buried deep down in an internal class library though -- the source code is available of course, since Microsoft put it all out there as open source.
You can grab the code for it here: http://www.dotnetframework.org/default.aspx/4#0/4#0/DEVDIV_TFS/Dev10/Releases/RTMRel/wpf/src/Base/MS/Internal/IO/Packaging/SparseMemoryStream#cs/1305600/SparseMemoryStream#cs
That being said, I highly recommend not using it as is -- At the very least remove all the calls to IsolatedStorage for starters, as this seems to be the cause of no end of bugs* in the framework's packaging API.
(*: In addition to spreading the data around in streams, if it gets too big, it basically reinvents swap files for some reason -- in the user's Isolated Storage no less -- and coincidentally, most MS products that allow for .NET based add-ins do not have their app domains setup in such a way that you can access Isolated Storage -- VSTO add-ins are notorious for suffering from this issue, for example.)
Another implementation of chunked stream could be considered as a stock MemoryStream replacement. Additionally it allows to allocate a single large byte array on LOH which will be used as a "chunk" pool, shared between all ChunkedStream instances...
https://github.com/ImmortalGAD/ChunkedStream
Related
I have a JsonNode to write to a file.
The JSON contains a string with a special character in it: "š".
It's written as "\uD83D\uDC15" and it's not exactly what I want.
JSON files support UTF-8, and "š" is a valid UTF-8 code point consisting of 4 bytes:
0xF0 0x9F 0x90 0xB6.
Instead I get it translated to 12 bytes, just in case I would edit it on old terminal from the 80s. I'm not interested. I actually use Visual Studio Code for editing the file.
How to force writing without such unwanted translations?
BTW, the file is deserialized correctly, the deserialized string contains valid Unicode codepoint. So - basically the application works, however I'm super curious how to change the serialization behavior.
In case someone's too curious about the code, here it is:
public virtual void Save(JsonNode node, Stream stream) {
if (node is null) return;
using var writer = new Utf8JsonWriter(stream, WriterOptions);
node.WriteTo(writer, SerializerOptions);
}
...where WriterOptions:
public JsonWriterOptions WriterOptions { get; } = new() { Indented = true, SkipValidation = true };
...and SerializerOptions:
public JsonSerializerOptions SerializerOptions { get; } = new() { WriteIndented = true };
Here's an example project showing the issue:
https://github.com/HTD/JsonNodeUtfIssue/blob/master/JsonNodeUtfIssue/Program.cs
https://dotnetfiddle.net/73RxAd
Here's my workaround. A decoding stream.
When Utf8JsonWriter writes UTF-8 bytes to the stream, my Utf8DecodeStream searches for Unicode escape sequences, decodes them to UTF-8 bytes and writes instead of original escape sequences.
It's relatively fast, because it doesn't use regular expressions, string search / replacement, string to number conversions, avoidable allocations and so on.
It operates directly on binary data (original stream's buffer). It may fail to replace a sequence when it's broken between 2 writes. In such case the file will not be damaged, just that one sequence will be left unchanged.
OK, there is one special case - if the block boundary will end breaking a long Unicode escape sequence into 2 valid 16-bit codes, it would result with invalid decoding, since 2 16-bit chars decoded to UTF-8 bytes and just concatenated won't produce the valid 32-bit UTF-8 code point.
BTW, I don't know if the Utf8JsonWriter would break writes in the middle of strings, it might write whole lines, or at least JSON tokens, so mentioned problems might never occur.
It's worth noting the class uses the escape sequence generated by Utf8JsonWriter - so, for speed, it doesn't decode sequences starting with "\U" or containing lower case hexadecimal digits. The support for different formats can easily be added.
CAUTION: The Utf8JsonWriter escapes Unicode sequences for a reason that is security. Do not decode if it might make the application vulnerable.
/// <summary>
/// Decodes the Unicode escape sequences while writing UTF-8 stream.
/// </summary>
/// <remarks>
/// This is a workaround for a <see cref="Utf8JsonWriter"/> not doing it on its own.
/// </remarks>
public class Utf8DecodeStream : Stream {
/// <summary>
/// Creates a Unicode escape sequence decoding stream over a writeable stream.
/// </summary>
/// <param name="stream">A writeable stream.</param>
public Utf8DecodeStream(Stream stream) => InnerStream = stream;
#pragma warning disable CS1591
public override bool CanRead => InnerStream.CanRead;
public override bool CanSeek => InnerStream.CanSeek;
public override bool CanWrite => InnerStream.CanWrite;
public override long Length => InnerStream.Length;
public override long Position { get => InnerStream.Position; set => InnerStream.Position = value; }
public override void Flush() => InnerStream.Flush();
public override int Read(byte[] buffer, int offset, int count) => InnerStream.Read(buffer, offset, count);
public override long Seek(long offset, SeekOrigin origin) => InnerStream.Seek(offset, origin);
public override void SetLength(long value) => InnerStream.SetLength(value);
#pragma warning restore CS1591
/// <summary>
/// Writes the buffer with the Unicode sequences decoded.
/// </summary>
/// <param name="buffer">Buffer to write.</param>
/// <param name="offset">Position in the buffer to start.</param>
/// <param name="count">Number of bytes to write.</param>
public override void Write(byte[] buffer, int offset, int count) {
bool sequenceFound = false;
while (count > 0) {
sequenceFound = false;
for (int i = offset, n = offset + count; i < n; i++) {
if (DecodeUtf8Sequence(buffer, i, out var sequence, out var bytesConsumed)) {
InnerStream.Write(buffer, offset, i - offset);
count -= i - offset;
InnerStream.Write(sequence);
offset = i + bytesConsumed;
count -= bytesConsumed;
sequenceFound = true;
break;
}
}
if (!sequenceFound) {
InnerStream.Write(buffer, offset, count);
count = 0;
}
}
}
/// <summary>
/// Tries to decode one or more subsequent Unicode escape sequences into UTF-8 bytes.
/// </summary>
/// <param name="buffer">A buffer to decode.</param>
/// <param name="index">An index to start decoding from.</param>
/// <param name="result">An array containing UTF-8 representation of the sequence.</param>
/// <param name="bytesConsumed">The length of the matched escape sequence.</param>
/// <returns>True if one or more subsequent Unicode escape sequences is found.</returns>
private static bool DecodeUtf8Sequence(byte[] buffer, int index, out byte[] result, out int bytesConsumed) {
bytesConsumed = 0;
result = Array.Empty<byte>();
List<char> parts = new(2);
while (DecodeChar(buffer, index, out var part)) {
parts.Add(part);
index += 6;
bytesConsumed += 6;
}
if (parts.Count < 1) return false;
result = Encoding.UTF8.GetBytes(parts.ToArray());
return true;
}
/// <summary>
/// Tries to decode a single Unicode escape sequence.
/// </summary>
/// <remarks>
/// "\uXXXX" format is assumed for <see cref="Utf8JsonWriter"/> output.
/// </remarks>
/// <param name="buffer">A buffer to decode.</param>
/// <param name="index">An index to start decoding from.</param>
/// <param name="result">Decoded character.</param>
/// <returns>True if a single Unicode sequnece is found at specified index.</returns>
private static bool DecodeChar(byte[] buffer, int index, out char result) {
result = (char)0;
if (index + 6 >= buffer.Length || buffer[index] != '\\' || buffer[index + 1] != 'u') return false;
int charCode = 0;
for (int i = 0; i < 4; i++)
if (!DecodeDigit(i, buffer, index + 2, ref charCode)) return false;
result = (char)charCode;
return true;
}
/// <summary>
/// Tries to decode a single hexadecimal digit from a buffer.
/// </summary>
/// <remarks>
/// Upper case is assumed for <see cref="Utf8JsonWriter"/> output.
/// </remarks>
/// <param name="n">A zero-based digit index.</param>
/// <param name="buffer">Buffer to decode.</param>
/// <param name="index">Sequence index.</param>
/// <param name="charCode">Character code reference.</param>
/// <returns>True if the buffer contains a hexadecimal digit at <paramref name="index"/> + <paramref name="n"/>.</returns>
private static bool DecodeDigit(int n, byte[] buffer, int index, ref int charCode) {
var value = buffer[index + n];
var shift = 12 - (n << 2);
if (value is >= 48 and <= 57) charCode += (value - 48) << shift;
else if (value is >= 65 and <= 70) charCode += (value - 55) << shift;
else return false;
return true;
}
/// <summary>
/// Target stream.
/// </summary>
private readonly Stream InnerStream;
}
Usage:
using var writer = new Utf8JsonWriter(new Utf8DecodeStream(stream));
Example xUnit test:
[Fact]
public void Utf8DecodeStreamTest() {
var test = Encoding.UTF8.GetBytes(#"\uD83D\uDC15 \uD83D\uDC15 \uD83D\uDC15");
using var stream = new MemoryStream();
var decoding = new Utf8DecodeStream(stream);
decoding.Write(test);
decoding.Flush();
var result = Encoding.UTF8.GetString(stream.ToArray());
Assert.Equal("š š š", result);
}
when we added paralllism elaboration on our application (dotnet service) we found some unexpected behavoir on crc calculation over text documents.
to isolate the issue i created a test case. the crc calculation fails when invoked from parallel looop. in this test case replacing parallel foreach with standard always fine. I think i've to made so change in crc32 class implementation, but i need some help to understand the right way. Thanks.
this the test method.
[TestMethod()]
public void Test_Crc_TestoDoc()
{
string query = #"select top 100 docId from sometable";
///key is document's id
///value is a couple, crc and text
Dictionary<int, Tuple<int, string>> docs = new Dictionary<int, Tuple<int, string>>();
using (SqlDataReader oSqlDataReader = Utility.ExecuteSP_Reader(query))
{
while (oSqlDataReader.Read())
{
int docId = oSqlDataReader.GetInt32(0);
///retrive the text by docId
string docText = Utility.GetDocText(docId);
///calculate and add crc in dic
int CRC = CRC32.Compute(docText);
docs.Add(docId, new Tuple<int, string>(CRC, docText));
}
oSqlDataReader.Close();
}
///calculate crc 100 times to check if the value
///is always the same for same text
for (int i = 0; i < 100; i++)
{
Parallel.ForEach(docs.Keys,(int docId) =>
{
///crc saved in dictionary
int CRC1 = docs[docId].Item1;
///text saved in dictionary
string docText = docs[docId].Item2;
///calculate crc again, crc2 must be equal to crc1 stored in dictionary
int CRC2 = CRC32.Compute(docText);
Assert.AreEqual(CRC1, CRC2, $"crc not equal, why? docId->{docId} CRC1->{CRC1} CRC2->{CRC2}");
});
}
}
crc32 class:
public class CRC32 : HashAlgorithm
{
#region CONSTRUCTORS
/// <summary>Creates a CRC32 object using the <see cref="DefaultPolynomial"/>.</summary>
public CRC32()
: this(DefaultPolynomial)
{
}
/// <summary>Creates a CRC32 object using the specified polynomial.</summary>
/// <remarks>The polynomical should be supplied in its bit-reflected form. <see cref="DefaultPolynomial"/>.</remarks>
[CLSCompliant(false)]
public CRC32(uint polynomial)
{
HashSizeValue = 32;
_crc32Table = (uint[])_crc32TablesCache[polynomial];
if (_crc32Table == null)
{
_crc32Table = CRC32._buildCRC32Table(polynomial);
_crc32TablesCache.Add(polynomial, _crc32Table);
}
Initialize();
}
// static constructor
static CRC32()
{
_crc32TablesCache = Hashtable.Synchronized(new Hashtable());
_defaultCRC = new CRC32();
}
#endregion
#region PROPERTIES
/// <summary>Gets the default polynomial (used in WinZip, Ethernet, etc.)</summary>
/// <remarks>The default polynomial is a bit-reflected version of the standard polynomial 0x04C11DB7 used by WinZip, Ethernet, etc.</remarks>
[CLSCompliant(false)]
public static readonly uint DefaultPolynomial = 0xEDB88320; // Bitwise reflection of 0x04C11DB7;
#endregion
#region METHODS
/// <summary>Initializes an implementation of HashAlgorithm.</summary>
public override void Initialize()
{
_crc = _allOnes;
}
/// <summary>Routes data written to the object into the hash algorithm for computing the hash.</summary>
protected override void HashCore(byte[] buffer, int offset, int count)
{
for (int i = offset; i < count; i++)
{
ulong ptr = (_crc & 0xFF) ^ buffer[i];
_crc >>= 8;
_crc ^= _crc32Table[ptr];
}
}
/// <summary>Finalizes the hash computation after the last data is processed by the cryptographic stream object.</summary>
protected override byte[] HashFinal()
{
byte[] finalHash = new byte[4];
ulong finalCRC = _crc ^ _allOnes;
finalHash[0] = (byte)((finalCRC >> 0) & 0xFF);
finalHash[1] = (byte)((finalCRC >> 8) & 0xFF);
finalHash[2] = (byte)((finalCRC >> 16) & 0xFF);
finalHash[3] = (byte)((finalCRC >> 24) & 0xFF);
return finalHash;
}
/// <summary>Computes the CRC32 value for the given ASCII string using the <see cref="DefaultPolynomial"/>.</summary>
public static int Compute(string asciiString)
{
_defaultCRC.Initialize();
return ToInt32(_defaultCRC.ComputeHash(asciiString));
}
/// <summary>Computes the CRC32 value for the given input stream using the <see cref="DefaultPolynomial"/>.</summary>
public static int Compute(Stream inputStream)
{
_defaultCRC.Initialize();
return ToInt32(_defaultCRC.ComputeHash(inputStream));
}
/// <summary>Computes the CRC32 value for the input data using the <see cref="DefaultPolynomial"/>.</summary>
public static int Compute(byte[] buffer)
{
_defaultCRC.Initialize();
return ToInt32(_defaultCRC.ComputeHash(buffer));
}
/// <summary>Computes the hash value for the input data using the <see cref="DefaultPolynomial"/>.</summary>
public static int Compute(byte[] buffer, int offset, int count)
{
_defaultCRC.Initialize();
return ToInt32(_defaultCRC.ComputeHash(buffer, offset, count));
}
/// <summary>Computes the hash value for the given ASCII string.</summary>
/// <remarks>The computation preserves the internal state between the calls, so it can be used for computation of a stream data.</remarks>
public byte[] ComputeHash(string asciiString)
{
byte[] rawBytes = ASCIIEncoding.ASCII.GetBytes(asciiString);
return ComputeHash(rawBytes);
}
/// <summary>Computes the hash value for the given input stream.</summary>
/// <remarks>The computation preserves the internal state between the calls, so it can be used for computation of a stream data.</remarks>
new public byte[] ComputeHash(Stream inputStream)
{
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = inputStream.Read(buffer, 0, 4096)) > 0)
{
HashCore(buffer, 0, bytesRead);
}
return HashFinal();
}
/// <summary>Computes the hash value for the input data.</summary>
/// <remarks>The computation preserves the internal state between the calls, so it can be used for computation of a stream data.</remarks>
new public byte[] ComputeHash(byte[] buffer)
{
return ComputeHash(buffer, 0, buffer.Length);
}
/// <summary>Computes the hash value for the input data.</summary>
/// <remarks>The computation preserves the internal state between the calls, so it can be used for computation of a stream data.</remarks>
new public byte[] ComputeHash(byte[] buffer, int offset, int count)
{
HashCore(buffer, offset, count);
return HashFinal();
}
#endregion
#region PRIVATE SECTION
private static uint _allOnes = 0xffffffff;
private static CRC32 _defaultCRC;
private static Hashtable _crc32TablesCache;
private uint[] _crc32Table;
private uint _crc;
// Builds a crc32 table given a polynomial
private static uint[] _buildCRC32Table(uint polynomial)
{
uint crc;
uint[] table = new uint[256];
// 256 values representing ASCII character codes.
for (int i = 0; i < 256; i++)
{
crc = (uint)i;
for (int j = 8; j > 0; j--)
{
if ((crc & 1) == 1)
crc = (crc >> 1) ^ polynomial;
else
crc >>= 1;
}
table[i] = crc;
}
return table;
}
private static int ToInt32(byte[] buffer)
{
return BitConverter.ToInt32(buffer, 0);
}
#endregion
}
Probably the problem are all the "static" function.
In fact, a static function is the same for all of the instance of CRC32.
That means that while an instance is running, setting his parameter, another can write his own value over the first one.
I've implemented a number of TOTP classes now and they all generate the wrong output. Below I've posted the code I used for the most simple one.
I'd like for it to get implemented and behave just like Google Authenticator - For example like the code https://gauth.apps.gbraad.nl/#main.
So what I want to happen is that in the front end of the application a user will enter his secret "BANANAKEY123" which translates to a base32 string of "IJAU4QKOIFFUKWJRGIZQ====".
Now in the constructor below key would be "BANANAKEY123". Yet for some reason it' not generating the same OTP keys with this code as the GAuth OTP tool does.
The only two reasonable mistakes would be
var secretKeyBytes = Base32Encode(secretKey);
is wrong or that my timing function is wrong. I checked both and couldn't find the fault in any of those. So could someone please help me in the right direction? Thank you!
public class Totp
{
private readonly int digits = 6;
private readonly HMACSHA1 hmac;
private readonly HMACSHA256 hmac256;
private readonly Int32 t1 = 30;
internal int mode;
private string secret;
private const string allowedCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567";
public Totp(string key, int mode)
{
secret = key;
this.mode = mode;
}
// defaults to SHA-1
public Totp(string key)
{
secret = key;
this.mode = 1;
}
public Totp(string base32string, Int32 t1, int digits) : this(base32string)
{
this.t1 = t1;
this.digits = digits;
}
public Totp(string base32string, Int32 t1, int digits, int mode) : this(base32string, mode)
{
this.t1 = t1;
this.digits = digits;
}
public String getCodeString()
{
return GetCode(this.secret, GetInterval(DateTime.UtcNow));
}
private static long GetInterval(DateTime dateTime)
{
TimeSpan elapsedTime = dateTime.ToUniversalTime() - new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
return (long)elapsedTime.TotalSeconds / 30;
}
private static string GetCode(string secretKey, long timeIndex)
{
var secretKeyBytes = Base32Encode(secretKey);
HMACSHA1 hmac = new HMACSHA1(secretKeyBytes);
byte[] challenge = BitConverter.GetBytes(timeIndex);
if (BitConverter.IsLittleEndian) Array.Reverse(challenge);
byte[] hash = hmac.ComputeHash(challenge);
int offset = hash[19] & 0xf;
int truncatedHash = hash[offset] & 0x7f;
for (int i = 1; i < 4; i++)
{
truncatedHash <<= 8;
truncatedHash |= hash[offset + i] & 0xff;
}
truncatedHash %= 1000000;
return truncatedHash.ToString("D6");
}
private static byte[] Base32Encode(string source)
{
var bits = source.ToUpper().ToCharArray().Select(c =>
Convert.ToString(allowedCharacters.IndexOf(c), 2).PadLeft(5, '0')).Aggregate((a, b) => a + b);
return Enumerable.Range(0, bits.Length / 8).Select(i => Convert.ToByte(bits.Substring(i * 8, 8), 2)).ToArray();
}
}
I have been using this code for quite some time to generate Time-based OTP, hope it helps.
TotpAuthenticationService.cs
using System;
using System.Net;
using System.Security.Cryptography;
using System.Text;
namespace Wteen.Infrastructure.Services
{
/// <summary>
/// An Time Based Implementation of RFC 6238, a variation from the OTP (One Time Password) with, a default code life time of 30 seconds.
/// </summary>
public sealed class TotpAuthenticationService
{
private readonly Encoding _encoding;
private readonly int _length;
private readonly TimeSpan _timestep;
private readonly DateTime _unixEpoch;
/// <summary>
/// Create a new Instance of <see cref="TotpAuthenticationService"/>
/// </summary>
/// <param name="length">The length of the OTP</param>
/// <param name="duration">The peried of time in which the genartion of a OTP with the result with the same value</param>
public TotpAuthenticationService(int length, int duration = 30)
{
_length = length;
_encoding = new UTF8Encoding(false, true);
_timestep = TimeSpan.FromSeconds(duration);
_unixEpoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
}
/// <summary>
/// The current time step number
/// </summary>
private ulong CurrentTimeStepNumber => (ulong)(TimeElapsed.Ticks / _timestep.Ticks);
/// <summary>
/// The number of seconds elapsed since midnight UTC of January 1, 1970.
/// </summary>
private TimeSpan TimeElapsed => DateTime.UtcNow - _unixEpoch;
/// <summary>
///
/// </summary>
/// <param name="securityToken"></param>
/// <param name="modifier"></param>
/// <returns></returns>
public int GenerateCode(byte[] securityToken, string modifier = null)
{
if (securityToken == null)
throw new ArgumentNullException(nameof(securityToken));
using (var hmacshA1 = new HMACSHA1(securityToken))
{
return ComputeTotp(hmacshA1, CurrentTimeStepNumber, modifier);
}
}
/// <summary>
/// Validating for codes generated during the current and past code generation <see cref="timeSteps"/>
/// </summary>
/// <param name="securityToken">User's secerct</param>
/// <param name="code">The code to validate</param>
/// <param name="timeSteps">The number of time steps the <see cref="code"/> could be validated for.</param>
/// <param name="channel">Possible channels could be user's email or mobile number where the code will be sent to</param>
/// <returns></returns>
public bool ValidateCode(byte[] securityToken, int code, int timeSteps, string channel = null)
{
if (securityToken == null)
throw new ArgumentNullException(nameof(securityToken));
using (var hmacshA1 = new HMACSHA1(securityToken))
{
for (var index = -timeSteps; index <= timeSteps; ++index)
if (ComputeTotp(hmacshA1, CurrentTimeStepNumber + (ulong)index, channel) == code)
return true;
}
return false;
}
private byte[] ApplyModifier(byte[] input, string modifier)
{
if (string.IsNullOrEmpty(modifier))
return input;
var bytes = _encoding.GetBytes(modifier);
var numArray = new byte[checked(input.Length + bytes.Length)];
Buffer.BlockCopy(input, 0, numArray, 0, input.Length);
Buffer.BlockCopy(bytes, 0, numArray, input.Length, bytes.Length);
return numArray;
}
private int ComputeTotp(HashAlgorithm algorithm, ulong timestepNumber, string modifier)
{
var bytes = BitConverter.GetBytes(IPAddress.HostToNetworkOrder((long)timestepNumber));
var hash = algorithm.ComputeHash(ApplyModifier(bytes, modifier));
var index = hash[hash.Length - 1] & 15;
return (((hash[index] & sbyte.MaxValue) << 24) | ((hash[index + 1] & byte.MaxValue) << 16) | ((hash[index + 2] & byte.MaxValue) << 8) | (hash[index + 3] & byte.MaxValue)) % (int)Math.Pow(10, _length);
}
}
}
Iām working on an Objective C app which receives GPS routes from a server. The routes are encoded and I have the decoding scripts written in both PHP and C#.
To my knowledge there it's not possible to compile or import the script into the Xcode project.
I have attached the two scripts below alongside a copy of the encoded GPS route.
I've studied the script a considerable amount and understand (for the most part) what is happening.) Is it feasible to port this code to Objective C? If so what would be the best approach? And how would I port the byte streams?
I'm slightly confused since I have never worked with encoding/decoding before, but any help is highly appreciated.
Encoded GPS Route
BANZ1OIkuAAAAAAAAAAAAAAAJAP+S1s5CQxSBwAiljQJ/8//RgIsBA7bAjQF//r/MQQsBPjSAjQP/6//RAIsAvjbAjQR/6j/PgAsB8q2ACwI75D+LAXow/4sBd2V/iwLnZcANA7/cf9y/iwKnpH+NCX/g/9s/DQP/4T/bvw0C/+1/3H8LArVqPwsBPPO/iwC/foANBX/tf9U+DQY/2z/nPw0F//X/zjqNBL/wP8+6CwE+9P6LBkhxvo0Ev9U/6XyLAWtFgI0I/9+AB4CLAfKKwQ0Gv9nAHAELATdFQAsBdIcAiwQkEn8LATgFf4sD6kj+CwUjFr6NAb/+gCfBiwN7+j+LAcdhf4sAQ3vADQM/+T/Q/AsAu/l/DQK/7r/c+4sDd2E9CwIwbn0NAX/dP+v6jQR/4n/augsCcCP+CwG0N36LAHzAP40Gv+w/1b0LAv2u/4sBN3i/CwQu6X2LBEOigA0LP/z/0z+NBYAJP84GjQcAH7/eR4sA//iAjQXABj/TRY0Hf/Q/1EONB8AP/9KGDQVAIb/fho0GACP/4oQLBFtogosAx7NBDQNAKD/lw4sCHngDCwGLcQGLAEN8wIsBTTMCCwDAdMCLAc5swosBCPhBCwCCvoALA9howwsCUO1CDQLAKH/pggsAyvYADQIAHz/XAQsAiDiAiwFMvcALAMd9wIsBgPEAjQTAAX/KAo0BwAp/yv2LATli/osAwPb/jQJAF7/OAY0CQBB/0EINA4Acf9RDiwHDqMGLAnmrgY0EP/G/0EMLAfhqQQsBP3RBCwGGKMOLAUmwwwsAQDyAiwJH6IQLAIX9wQsCVsPDCwFFisANBQAhACLAjQMAFIAxvQsAQoKACwFLywCNA8AfACBADQUAJQAhgQsClNhADQJALUAOw4sCEpIBjQPAJ4AZBIsCFokDCwJaZUOLAEG9wA0KQCM/+MSNBwAeQCMDDQQAHr/XxA0JACX/7cSLAQd5AQsBAfUAiwEG94CNBH/+/83BiwCBvAANCAAgP/QDCwgPHkCLAgEXf40HP/0AMD6LAIRHQIsB+RM/CwBAAj+LAztefwsC/Z7/jQTAEkAtAIsBAMxADQS/+EAjvwsB9FV/CwJ+V/+NBUACwDL/CwB+wcALCEMX/4sBPQ9/jQa/6gAofY0CP/fAMf4LAbMXvYsIUg5AjQKALn/1xAsAhH+AiwFMvsENB0Aqf/cDiwbaKEMLBuxGfosBu9M+jQT/5YApOwsFLOpADQH/24AIvQsBpAq8iwD+vkANBIArv+oFiwIScIKLAZEoQosB2m/CCwBBfwCNCcAswAEDDQOAK//vQYsB34W/CwEEA/+NA0Awf/s+iwFQhj8NAkAgf/G/iwDLQX+NBQAuf//ADQOALkAARYsBTUVBjQkAIIACw4sBT4CCCwEJO4GLAIY+gIsB1IfCDQWALMARRIsBToJBCwfZEgKLAPoJ/wsA+oa/CwM5lD+LATjGvwsAu/+/jQO/08AQuwsAuAJ/CwIqCP2LAa+DfgsA/ID/jQV/3cAfOw0G/+iAJbwLAXZfPY0Df++ALjyNBT/jACe7jQT/8YAvfgsBMr/ACwD1NoALAmVvQAsAf/+ADQaAIsAif40CAAZANn4LAYLafwsA+Yp/CwD5xH+LATlJf4sA/Yb/iwF8yj8LBpBJQAsAh3+AiwDKR4ANBAAagCM/iwFJQECNBMAugAkDiwGTTYELAMVJQAsClwbBjQlAIQAdQQ0GwCbAFYKNBYApwA6DDQPAKwAWAQsBELlCCwBAgAANBb/xgDS6jQI/8wAhPIsBdk2+CwMtGTyNBUADADF8DQV/8QAwuw0EP/fAL/2LAMlLAA0CQC6AAUKLAlYCgYsB0YMBjQPAML/0hIsAQ3/AiwJZfEKNA0AsQBTCiwCFAQCLAED/wA0CgCgAHIKNAsAlwB8CCwHRUIELAdgBgosCWYvCCwJVy4INAoAgwBoCCwDJREANAoAnQCCADQHAEsAzfo0DAB0AKH+NAsAqQBzAjQGAHgAov40DQB6AJ/+NAgAnQCGACwDPCQANA4AngAyAiwEYhQANAgAxQAABDQIALwACQI0CAC4/98GLARE3AQsBVknAjQHAJEAkAI0Bv/wAN36NAX/+gDV+jQF//8A4PY0BgAGAOr0NAb/6wDd/DQH/+YA0fo0Bf/oAPb+NAP/+wCS/CwDAjT8NMoAGgEF9jQx/8T/YwQ0GAAr/zgMLAUP4AA=
C# Decode Script
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
// Classes for managing IOF XML 3.0 Route element.
// IOF XML 3.0 specification: http://code.google.com/p/iofdatastandard/source/browse/trunk/IOF.xsd.
// IOF XML 3.0 example result list file with Route element: http://code.google.com/p/iofdatastandard/source/browse/trunk/Examples/ResultList1.xml.
/// <summary>
/// Class representing a route, including logic for converting to/from an IOF XML 3.0 route stored in binary format.
/// </summary>
public class IofXml30Route
{
private double? length;
private IEnumerable<IofXml30Waypoint> waypoints = new List<IofXml30Waypoint>();
/// <summary>
/// The waypoints of the route.
/// </summary>
public IEnumerable<IofXml30Waypoint> Waypoints
{
get { return waypoints; }
set { waypoints = value ?? new List<IofXml30Waypoint>(); }
}
/// <summary>
/// Writes the route in IOF XML 3.0 binary format to the specified stream.
/// </summary>
/// <param name="stream">The stream to write to.</param>
public void WriteToStream(Stream stream)
{
IofXml30Waypoint previousWaypoint = null;
foreach (var waypoint in Waypoints)
{
waypoint.WriteToStream(stream, previousWaypoint);
previousWaypoint = waypoint;
}
}
/// <summary>
/// Converts the route to IOF XML 3.0 binary format and returns it as a base64-encoded string.
/// </summary>
/// <param name="formattingOptions">The formatting options for the base64-encoded string.</param>
public string ToBase64String(Base64FormattingOptions formattingOptions = Base64FormattingOptions.None)
{
return Convert.ToBase64String(ToByteArray(), formattingOptions);
}
/// <summary>
/// Converts the route to IOF XML 3.0 binary format and returns it as a byte array.
/// </summary>
public byte[] ToByteArray()
{
using (var ms = new MemoryStream())
{
WriteToStream(ms);
return ms.ToArray();
}
}
/// <summary>
/// Reads a route in IOF XML 3.0 binary format from a stream.
/// </summary>
/// <param name="stream">The stream to read from.</param>
public static IofXml30Route FromStream(Stream stream)
{
var waypoints = new List<IofXml30Waypoint>();
while (stream.Position < stream.Length)
{
waypoints.Add(IofXml30Waypoint.FromStream(stream, waypoints.LastOrDefault()));
}
return new IofXml30Route() { Waypoints = waypoints };
}
/// <summary>
/// Reads a route in IOF XML 3.0 binary format from a base64-encoded string.
/// </summary>
/// <param name="base64String">The base64-encoded string to read from.</param>
public static IofXml30Route FromBase64String(string base64String)
{
return FromByteArray(Convert.FromBase64String(base64String));
}
/// <summary>
/// Reads a route in IOF XML 3.0 binary format from a byte array.
/// </summary>
/// <param name="bytes">The bytes to read from.</param>
public static IofXml30Route FromByteArray(byte[] bytes)
{
using (var ms = new MemoryStream(bytes))
{
return FromStream(ms);
}
}
/// <summary>
/// Gets the length of the route in meters.
/// </summary>
public double Length
{
get { return length ?? (length = CalculateLength()).Value; }
}
/// <summary>
/// Gets the start time of the route.
/// </summary>
public DateTime StartTime
{
get { return Waypoints.Any() ? Waypoints.First().Time : DateTime.MinValue; }
}
/// <summary>
/// Gets the end time of the route.
/// </summary>
public DateTime EndTime
{
get { return Waypoints.Any() ? Waypoints.Last().Time : DateTime.MinValue; }
}
/// <summary>
/// Gets the duration of the route.
/// </summary>
public TimeSpan Duration
{
get { return EndTime - StartTime; }
}
private double CalculateLength()
{
var sum = 0.0;
var wpList = Waypoints.ToList();
for(var i=1; i<Waypoints.Count(); i++)
{
sum += GetDistanceBetweenWaypoints(wpList[i - 1], wpList[i]);
}
return sum;
}
private static double GetDistanceBetweenWaypoints(IofXml30Waypoint w1, IofXml30Waypoint w2)
{
// use spherical coordinates: rho, phi, theta
const double rho = 6378200; // earth radius in metres
double sinPhi0 = Math.Sin(0.5 * Math.PI + w1.Latitude / 180.0 * Math.PI);
double cosPhi0 = Math.Cos(0.5 * Math.PI + w1.Latitude / 180.0 * Math.PI);
double sinTheta0 = Math.Sin(w1.Longitude / 180.0 * Math.PI);
double cosTheta0 = Math.Cos(w1.Longitude / 180.0 * Math.PI);
double sinPhi1 = Math.Sin(0.5 * Math.PI + w2.Latitude / 180.0 * Math.PI);
double cosPhi1 = Math.Cos(0.5 * Math.PI + w2.Latitude / 180.0 * Math.PI);
double sinTheta1 = Math.Sin(w2.Longitude / 180.0 * Math.PI);
double cosTheta1 = Math.Cos(w2.Longitude / 180.0 * Math.PI);
var x1 = rho * sinPhi0 * cosTheta0;
var y1 = rho * sinPhi0 * sinTheta0;
var z1 = rho * cosPhi0;
var x2 = rho * sinPhi1 * cosTheta1;
var y2 = rho * sinPhi1 * sinTheta1;
var z2 = rho * cosPhi1;
return DistancePointToPoint(x1, y1, z1, x2, y2, z2);
}
private static double DistancePointToPoint(double x1, double y1, double z1, double x2, double y2, double z2)
{
var sum = (x2 - x1)*(x2 - x1) + (y2 - y1)*(y2 - y1) + (z2 - z1)*(z2 - z1);
return Math.Sqrt(sum);
}
}
/// <summary>
/// Class representing a waypoint, including logic for converting to/from an IOF XML 3.0 waypoint stored in binary format.
/// </summary>
public class IofXml30Waypoint
{
private static readonly DateTime zeroTime = new DateTime(1900, 01, 01, 00, 00, 00, DateTimeKind.Utc);
private const long timeSecondsThreshold = 255;
private const long timeMillisecondsThreshold = 65535;
private const int lanLngBigDeltaLowerThreshold = -32768;
private const int lanLngBigDeltaUpperThreshold = 32767;
private const int lanLngSmallDeltaLowerThreshold = -128;
private const int lanLngSmallDeltaUpperThreshold = 127;
private const int altitudeDeltaLowerThreshold = -128;
private const int altitudeDeltaUpperThreshold = 127;
/// <summary>
/// Gets or sets the type of the waypoint; normal or interruption.
/// </summary>
public IofXml30WaypointType Type { get; set; }
/// <summary>
/// Gets or sets the time when the waypoint was recorded.
/// </summary>
public DateTime Time { get; set; }
/// <summary>
/// Gets or sets the latitude of the waypoint.
/// </summary>
public double Latitude { get; set; }
/// <summary>
/// Gets or sets the longitude of the waypoint.
/// </summary>
public double Longitude { get; set; }
/// <summary>
/// Gets or sets the altitude of the waypoint.
/// </summary>
public double? Altitude { get; set; }
/// <summary>
/// Gets or sets the the time when the waypoint was recorded in the internal storage mode.
/// </summary>
public ulong StorageTime
{
get { return (ulong)Math.Round((Time - zeroTime).TotalMilliseconds); }
set { Time = zeroTime.AddMilliseconds(value); }
}
/// <summary>
/// Gets or sets the latitude of the waypoint in the internal storage mode.
/// </summary>
public int StorageLatitude
{
get { return (int)Math.Round(Latitude * 1000000); }
set { Latitude = (double)value / 1000000; }
}
/// <summary>
/// Gets or sets the longitude of the waypoint in the internal storage mode.
/// </summary>
public int StorageLongitude
{
get { return (int)Math.Round(Longitude * 1000000); }
set { Longitude = (double)value / 1000000; }
}
/// <summary>
/// Gets or sets the altitude of the waypoint in the internal storage mode.
/// </summary>
public int? StorageAltitude
{
get { return Altitude == null ? (int?)null : (int)Math.Round(Altitude.Value * 10); }
set { Altitude = value == null ? (double?)null : (double)value / 10; }
}
/// <summary>
/// Writes the waypoint in IOF XML 3.0 binary format to a stream.
/// </summary>
/// <param name="stream">The stream to write to.</param>
/// <param name="previousWaypoint">The previous waypoint of the route, or null if this is the first waypoint.</param>
public void WriteToStream(Stream stream, IofXml30Waypoint previousWaypoint)
{
var timeStorageMode = TimeStorageMode.Full;
if (previousWaypoint != null)
{
if ((StorageTime - previousWaypoint.StorageTime) % 1000 == 0 && (StorageTime - previousWaypoint.StorageTime) / 1000 <= timeSecondsThreshold)
{
timeStorageMode = TimeStorageMode.Seconds;
}
else if (StorageTime - previousWaypoint.StorageTime <= timeMillisecondsThreshold)
{
timeStorageMode = TimeStorageMode.Milliseconds;
}
}
var positionStorageMode = PositionStorageMode.Full;
if (previousWaypoint != null &&
(StorageAltitude == null || (previousWaypoint.StorageAltitude != null && StorageAltitude - previousWaypoint.StorageAltitude >= altitudeDeltaLowerThreshold && StorageAltitude - previousWaypoint.StorageAltitude <= altitudeDeltaUpperThreshold)))
{
if (StorageLatitude - previousWaypoint.StorageLatitude >= lanLngSmallDeltaLowerThreshold && StorageLatitude - previousWaypoint.StorageLatitude <= lanLngSmallDeltaUpperThreshold &&
StorageLongitude - previousWaypoint.StorageLongitude >= lanLngSmallDeltaLowerThreshold && StorageLongitude - previousWaypoint.StorageLongitude <= lanLngSmallDeltaUpperThreshold)
{
positionStorageMode = PositionStorageMode.SmallDelta;
}
else if (StorageLatitude - previousWaypoint.StorageLatitude >= lanLngBigDeltaLowerThreshold && StorageLatitude - previousWaypoint.StorageLatitude <= lanLngBigDeltaUpperThreshold &&
StorageLongitude - previousWaypoint.StorageLongitude >= lanLngBigDeltaLowerThreshold && StorageLongitude - previousWaypoint.StorageLongitude <= lanLngBigDeltaUpperThreshold)
{
positionStorageMode = PositionStorageMode.BigDelta;
}
}
var headerByte = 0;
if (Type == IofXml30WaypointType.Interruption) headerByte |= (1 << 7);
if (timeStorageMode == TimeStorageMode.Milliseconds) headerByte |= (1 << 6);
if (timeStorageMode == TimeStorageMode.Seconds) headerByte |= (1 << 5);
if (positionStorageMode == PositionStorageMode.BigDelta) headerByte |= (1 << 4);
if (positionStorageMode == PositionStorageMode.SmallDelta) headerByte |= (1 << 3);
if (StorageAltitude != null) headerByte |= (1 << 2);
// header byte
stream.WriteByte((byte)headerByte);
// time byte(s)
switch (timeStorageMode)
{
case TimeStorageMode.Full: // 6 bytes
stream.Write(BitConverter.GetBytes(StorageTime).Reverse().ToArray(), 2, 6);
break;
case TimeStorageMode.Milliseconds: // 2 bytes
stream.Write(BitConverter.GetBytes((ushort)(StorageTime - previousWaypoint.StorageTime)).Reverse().ToArray(), 0, 2);
break;
case TimeStorageMode.Seconds: // 1 byte
stream.WriteByte((byte)((StorageTime - previousWaypoint.StorageTime) / 1000));
break;
}
// position bytes
switch (positionStorageMode)
{
case PositionStorageMode.Full: // 4 + 4 + 3 bytes
stream.Write(BitConverter.GetBytes(StorageLatitude).Reverse().ToArray(), 0, 4);
stream.Write(BitConverter.GetBytes(StorageLongitude).Reverse().ToArray(), 0, 4);
if (StorageAltitude != null) stream.Write(BitConverter.GetBytes(StorageAltitude.Value).Reverse().ToArray(), 1, 3);
break;
case PositionStorageMode.BigDelta: // 2 + 2 + 1 bytes
stream.Write(BitConverter.GetBytes((short)(StorageLatitude - previousWaypoint.StorageLatitude)).Reverse().ToArray(), 0, 2);
stream.Write(BitConverter.GetBytes((short)(StorageLongitude - previousWaypoint.StorageLongitude)).Reverse().ToArray(), 0, 2);
if (StorageAltitude != null) stream.Write(BitConverter.GetBytes((sbyte)(StorageAltitude - previousWaypoint.StorageAltitude).Value), 0, 1);
break;
case PositionStorageMode.SmallDelta: // 1 + 1 + 1 bytes
stream.Write(BitConverter.GetBytes((sbyte)(StorageLatitude - previousWaypoint.StorageLatitude)), 0, 1);
stream.Write(BitConverter.GetBytes((sbyte)(StorageLongitude - previousWaypoint.StorageLongitude)), 0, 1);
if (StorageAltitude != null) stream.Write(BitConverter.GetBytes((sbyte)(StorageAltitude - previousWaypoint.StorageAltitude).Value), 0, 1);
break;
}
}
/// <summary>
/// Reads a waypoint in IOF XML 3.0 binary format from a stream.
/// </summary>
/// <param name="stream">The stream to read from.</param>
/// <param name="previousWaypoint">The previous waypoint of the route, or null if this is the first waypoint.</param>
/// <returns></returns>
public static IofXml30Waypoint FromStream(Stream stream, IofXml30Waypoint previousWaypoint)
{
var waypoint = new IofXml30Waypoint();
// header byte
var headerByte = stream.ReadByte();
waypoint.Type = (headerByte & (1 << 7)) == 0 ? IofXml30WaypointType.Normal : IofXml30WaypointType.Interruption;
var timeStorageMode = TimeStorageMode.Full;
if ((headerByte & (1 << 6)) > 0)
{
timeStorageMode = TimeStorageMode.Milliseconds;
}
else if ((headerByte & (1 << 5)) > 0)
{
timeStorageMode = TimeStorageMode.Seconds;
}
var positionStorageMode = PositionStorageMode.Full;
if ((headerByte & (1 << 4)) > 0)
{
positionStorageMode = PositionStorageMode.BigDelta;
}
else if ((headerByte & (1 << 3)) > 0)
{
positionStorageMode = PositionStorageMode.SmallDelta;
}
var altitudePresent = (headerByte & (1 << 2)) > 0;
byte[] bytes;
int b;
// time byte(s)
switch (timeStorageMode)
{
case TimeStorageMode.Full: // 4 bytes
bytes = new byte[8];
stream.Read(bytes, 2, 6);
waypoint.StorageTime = BitConverter.ToUInt64(bytes.Reverse().ToArray(), 0);
break;
case TimeStorageMode.Milliseconds: // 2 bytes
bytes = new byte[2];
stream.Read(bytes, 0, 2);
waypoint.StorageTime = previousWaypoint.StorageTime + BitConverter.ToUInt16(bytes.Reverse().ToArray(), 0);
break;
case TimeStorageMode.Seconds: // 1 byte
b = stream.ReadByte();
waypoint.StorageTime = previousWaypoint.StorageTime + (ulong)b * 1000;
break;
}
// position bytes
switch (positionStorageMode)
{
case PositionStorageMode.Full: // 4 + 4 + 3 bytes
bytes = new byte[4];
stream.Read(bytes, 0, 4);
waypoint.StorageLatitude = BitConverter.ToInt32(bytes.Reverse().ToArray(), 0);
bytes = new byte[4];
stream.Read(bytes, 0, 4);
waypoint.StorageLongitude = BitConverter.ToInt32(bytes.Reverse().ToArray(), 0);
if (altitudePresent)
{
bytes = new byte[4];
stream.Read(bytes, 1, 3);
waypoint.StorageAltitude = BitConverter.ToInt32(bytes.Reverse().ToArray(), 0);
}
break;
case PositionStorageMode.BigDelta: // 2 + 2 + 1 bytes
bytes = new byte[2];
stream.Read(bytes, 0, 2);
waypoint.StorageLatitude = previousWaypoint.StorageLatitude + BitConverter.ToInt16(bytes.Reverse().ToArray(), 0);
bytes = new byte[2];
stream.Read(bytes, 0, 2);
waypoint.StorageLongitude = previousWaypoint.StorageLongitude + BitConverter.ToInt16(bytes.Reverse().ToArray(), 0);
if (altitudePresent)
{
b = stream.ReadByte();
waypoint.StorageAltitude = previousWaypoint.StorageAltitude + (sbyte)b;
}
break;
case PositionStorageMode.SmallDelta: // 1 + 1 + 1 bytes
b = stream.ReadByte();
waypoint.StorageLatitude = previousWaypoint.StorageLatitude + (sbyte)b;
b = stream.ReadByte();
waypoint.StorageLongitude = previousWaypoint.StorageLongitude + (sbyte)b;
if (altitudePresent)
{
b = stream.ReadByte();
waypoint.StorageAltitude = previousWaypoint.StorageAltitude + (sbyte)b;
}
break;
}
return waypoint;
}
/// <summary>
/// The storage mode for the time of a waypoint.
/// </summary>
private enum TimeStorageMode
{
/// <summary>
/// The time is stored as a 6-byte unsigned integer, and shows the number of milliseconds since January 1, 1900, 00:00:00 UTC.
/// </summary>
Full,
/// <summary>
/// The time is stored as a 2-byte unsigned integer, and shows the number of seconds since the previous waypoint's time.
/// </summary>
Seconds,
/// <summary>
/// The time is stored as a 4-byte unsigned integer, and shows the number of milliseconds since the previous waypoint's time.
/// </summary>
Milliseconds
}
/// <summary>
/// The storage mode for the position (latitude, longitude, altitude) of a waypoint.
/// </summary>
private enum PositionStorageMode
{
/// <summary>
/// The longitude and latitude are stored as microdegrees in 4-byte signed integers, and the altitude is stored as decimeters in a 3-byte signed integer.
/// </summary>
Full,
/// <summary>
/// The longitude and latitude are stored as microdegrees relative to the previous waypoint in 2-byte signed integers, and the altitude is stored as decimeters relative to the previous waypoint in a 3-byte signed integer>.
/// </summary>
BigDelta,
/// <summary>
/// The longitude and latitude are stored as microdegrees relative to the previous waypoint in 1-byte signed integers, and the altitude is stored as decimeters relative to the previous waypoint in a 1-byte signed integer.
/// </summary>
SmallDelta
}
}
/// <summary>
/// The type of waypoint.
/// </summary>
public enum IofXml30WaypointType
{
/// <summary>
/// A normal waypoint.
/// </summary>
Normal,
/// <summary>
/// A waypoint that is the last waypoint before an interruption in the route occurs.
/// </summary>
Interruption
}
PHP Decode Script
The PHP script is very similar. I can't attach due to the limit of the size of this post, however I can provide it if needed.
I need to process a large file, around 400K lines and 200 M. But sometimes I have to process from bottom up. How can I use iterator (yield return) here? Basically I don't like to load everything in memory. I know it is more efficient to use iterator in .NET.
Reading text files backwards is really tricky unless you're using a fixed-size encoding (e.g. ASCII). When you've got variable-size encoding (such as UTF-8) you will keep having to check whether you're in the middle of a character or not when you fetch data.
There's nothing built into the framework, and I suspect you'd have to do separate hard coding for each variable-width encoding.
EDIT: This has been somewhat tested - but that's not to say it doesn't still have some subtle bugs around. It uses StreamUtil from MiscUtil, but I've included just the necessary (new) method from there at the bottom. Oh, and it needs refactoring - there's one pretty hefty method, as you'll see:
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace MiscUtil.IO
{
/// <summary>
/// Takes an encoding (defaulting to UTF-8) and a function which produces a seekable stream
/// (or a filename for convenience) and yields lines from the end of the stream backwards.
/// Only single byte encodings, and UTF-8 and Unicode, are supported. The stream
/// returned by the function must be seekable.
/// </summary>
public sealed class ReverseLineReader : IEnumerable<string>
{
/// <summary>
/// Buffer size to use by default. Classes with internal access can specify
/// a different buffer size - this is useful for testing.
/// </summary>
private const int DefaultBufferSize = 4096;
/// <summary>
/// Means of creating a Stream to read from.
/// </summary>
private readonly Func<Stream> streamSource;
/// <summary>
/// Encoding to use when converting bytes to text
/// </summary>
private readonly Encoding encoding;
/// <summary>
/// Size of buffer (in bytes) to read each time we read from the
/// stream. This must be at least as big as the maximum number of
/// bytes for a single character.
/// </summary>
private readonly int bufferSize;
/// <summary>
/// Function which, when given a position within a file and a byte, states whether
/// or not the byte represents the start of a character.
/// </summary>
private Func<long,byte,bool> characterStartDetector;
/// <summary>
/// Creates a LineReader from a stream source. The delegate is only
/// called when the enumerator is fetched. UTF-8 is used to decode
/// the stream into text.
/// </summary>
/// <param name="streamSource">Data source</param>
public ReverseLineReader(Func<Stream> streamSource)
: this(streamSource, Encoding.UTF8)
{
}
/// <summary>
/// Creates a LineReader from a filename. The file is only opened
/// (or even checked for existence) when the enumerator is fetched.
/// UTF8 is used to decode the file into text.
/// </summary>
/// <param name="filename">File to read from</param>
public ReverseLineReader(string filename)
: this(filename, Encoding.UTF8)
{
}
/// <summary>
/// Creates a LineReader from a filename. The file is only opened
/// (or even checked for existence) when the enumerator is fetched.
/// </summary>
/// <param name="filename">File to read from</param>
/// <param name="encoding">Encoding to use to decode the file into text</param>
public ReverseLineReader(string filename, Encoding encoding)
: this(() => File.OpenRead(filename), encoding)
{
}
/// <summary>
/// Creates a LineReader from a stream source. The delegate is only
/// called when the enumerator is fetched.
/// </summary>
/// <param name="streamSource">Data source</param>
/// <param name="encoding">Encoding to use to decode the stream into text</param>
public ReverseLineReader(Func<Stream> streamSource, Encoding encoding)
: this(streamSource, encoding, DefaultBufferSize)
{
}
internal ReverseLineReader(Func<Stream> streamSource, Encoding encoding, int bufferSize)
{
this.streamSource = streamSource;
this.encoding = encoding;
this.bufferSize = bufferSize;
if (encoding.IsSingleByte)
{
// For a single byte encoding, every byte is the start (and end) of a character
characterStartDetector = (pos, data) => true;
}
else if (encoding is UnicodeEncoding)
{
// For UTF-16, even-numbered positions are the start of a character.
// TODO: This assumes no surrogate pairs. More work required
// to handle that.
characterStartDetector = (pos, data) => (pos & 1) == 0;
}
else if (encoding is UTF8Encoding)
{
// For UTF-8, bytes with the top bit clear or the second bit set are the start of a character
// See http://www.cl.cam.ac.uk/~mgk25/unicode.html
characterStartDetector = (pos, data) => (data & 0x80) == 0 || (data & 0x40) != 0;
}
else
{
throw new ArgumentException("Only single byte, UTF-8 and Unicode encodings are permitted");
}
}
/// <summary>
/// Returns the enumerator reading strings backwards. If this method discovers that
/// the returned stream is either unreadable or unseekable, a NotSupportedException is thrown.
/// </summary>
public IEnumerator<string> GetEnumerator()
{
Stream stream = streamSource();
if (!stream.CanSeek)
{
stream.Dispose();
throw new NotSupportedException("Unable to seek within stream");
}
if (!stream.CanRead)
{
stream.Dispose();
throw new NotSupportedException("Unable to read within stream");
}
return GetEnumeratorImpl(stream);
}
private IEnumerator<string> GetEnumeratorImpl(Stream stream)
{
try
{
long position = stream.Length;
if (encoding is UnicodeEncoding && (position & 1) != 0)
{
throw new InvalidDataException("UTF-16 encoding provided, but stream has odd length.");
}
// Allow up to two bytes for data from the start of the previous
// read which didn't quite make it as full characters
byte[] buffer = new byte[bufferSize + 2];
char[] charBuffer = new char[encoding.GetMaxCharCount(buffer.Length)];
int leftOverData = 0;
String previousEnd = null;
// TextReader doesn't return an empty string if there's line break at the end
// of the data. Therefore we don't return an empty string if it's our *first*
// return.
bool firstYield = true;
// A line-feed at the start of the previous buffer means we need to swallow
// the carriage-return at the end of this buffer - hence this needs declaring
// way up here!
bool swallowCarriageReturn = false;
while (position > 0)
{
int bytesToRead = Math.Min(position > int.MaxValue ? bufferSize : (int)position, bufferSize);
position -= bytesToRead;
stream.Position = position;
StreamUtil.ReadExactly(stream, buffer, bytesToRead);
// If we haven't read a full buffer, but we had bytes left
// over from before, copy them to the end of the buffer
if (leftOverData > 0 && bytesToRead != bufferSize)
{
// Buffer.BlockCopy doesn't document its behaviour with respect
// to overlapping data: we *might* just have read 7 bytes instead of
// 8, and have two bytes to copy...
Array.Copy(buffer, bufferSize, buffer, bytesToRead, leftOverData);
}
// We've now *effectively* read this much data.
bytesToRead += leftOverData;
int firstCharPosition = 0;
while (!characterStartDetector(position + firstCharPosition, buffer[firstCharPosition]))
{
firstCharPosition++;
// Bad UTF-8 sequences could trigger this. For UTF-8 we should always
// see a valid character start in every 3 bytes, and if this is the start of the file
// so we've done a short read, we should have the character start
// somewhere in the usable buffer.
if (firstCharPosition == 3 || firstCharPosition == bytesToRead)
{
throw new InvalidDataException("Invalid UTF-8 data");
}
}
leftOverData = firstCharPosition;
int charsRead = encoding.GetChars(buffer, firstCharPosition, bytesToRead - firstCharPosition, charBuffer, 0);
int endExclusive = charsRead;
for (int i = charsRead - 1; i >= 0; i--)
{
char lookingAt = charBuffer[i];
if (swallowCarriageReturn)
{
swallowCarriageReturn = false;
if (lookingAt == '\r')
{
endExclusive--;
continue;
}
}
// Anything non-line-breaking, just keep looking backwards
if (lookingAt != '\n' && lookingAt != '\r')
{
continue;
}
// End of CRLF? Swallow the preceding CR
if (lookingAt == '\n')
{
swallowCarriageReturn = true;
}
int start = i + 1;
string bufferContents = new string(charBuffer, start, endExclusive - start);
endExclusive = i;
string stringToYield = previousEnd == null ? bufferContents : bufferContents + previousEnd;
if (!firstYield || stringToYield.Length != 0)
{
yield return stringToYield;
}
firstYield = false;
previousEnd = null;
}
previousEnd = endExclusive == 0 ? null : (new string(charBuffer, 0, endExclusive) + previousEnd);
// If we didn't decode the start of the array, put it at the end for next time
if (leftOverData != 0)
{
Buffer.BlockCopy(buffer, 0, buffer, bufferSize, leftOverData);
}
}
if (leftOverData != 0)
{
// At the start of the final buffer, we had the end of another character.
throw new InvalidDataException("Invalid UTF-8 data at start of stream");
}
if (firstYield && string.IsNullOrEmpty(previousEnd))
{
yield break;
}
yield return previousEnd ?? "";
}
finally
{
stream.Dispose();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
// StreamUtil.cs:
public static class StreamUtil
{
public static void ReadExactly(Stream input, byte[] buffer, int bytesToRead)
{
int index = 0;
while (index < bytesToRead)
{
int read = input.Read(buffer, index, bytesToRead - index);
if (read == 0)
{
throw new EndOfStreamException
(String.Format("End of stream reached with {0} byte{1} left to read.",
bytesToRead - index,
bytesToRead - index == 1 ? "s" : ""));
}
index += read;
}
}
}
Feedback very welcome. This was fun :)
Attention: this approach doesn't work (explained in EDIT)
You could use File.ReadLines to get lines iterator
foreach (var line in File.ReadLines(#"C:\temp\ReverseRead.txt").Reverse())
{
if (noNeedToReadFurther)
break;
// process line here
Console.WriteLine(line);
}
EDIT:
After reading applejacks01's comment, I run some tests and it does look like .Reverse() actually loads whole file.
I used File.ReadLines() to print first line of a 40MB file - memory usage of console app was 5MB. Then, used File.ReadLines().Reverse() to print last line of same file - memory usage was 95MB.
Conclusion
Whatever `Reverse()' is doing, it is not a good choice for reading bottom of a big file.
Very fast solution for huge files: From C#, use PowerShell's Get-Content with the Tail parameter.
using System.Management.Automation;
using (PowerShell powerShell = PowerShell.Create())
{
string lastLine = powerShell.AddCommand("Get-Content")
.AddParameter("Path", #"c:\a.txt")
.AddParameter("Tail", 1)
.Invoke().FirstOrDefault()?.ToString();
}
Required reference: 'System.Management.Automation.dll' - may be somewhere like 'C:\Program Files (x86)\Reference Assemblies\Microsoft\WindowsPowerShell\3.0'
Using PowerShell incurs a small overhead but is worth it for huge files.
To create a file iterator you can do this:
EDIT:
This is my fixed version of a fixed-width reverse file reader:
public static IEnumerable<string> readFile()
{
using (FileStream reader = new FileStream(#"c:\test.txt",FileMode.Open,FileAccess.Read))
{
int i=0;
StringBuilder lineBuffer = new StringBuilder();
int byteRead;
while (-i < reader.Length)
{
reader.Seek(--i, SeekOrigin.End);
byteRead = reader.ReadByte();
if (byteRead == 10 && lineBuffer.Length > 0)
{
yield return Reverse(lineBuffer.ToString());
lineBuffer.Remove(0, lineBuffer.Length);
}
lineBuffer.Append((char)byteRead);
}
yield return Reverse(lineBuffer.ToString());
reader.Close();
}
}
public static string Reverse(string str)
{
char[] arr = new char[str.Length];
for (int i = 0; i < str.Length; i++)
arr[i] = str[str.Length - 1 - i];
return new string(arr);
}
I also add my solution. After reading some answers, nothing really fit to my case.
I'm reading byte by byte from from behind until I find a LineFeed, then I'm returing the collected bytes as string, without using buffering.
Usage:
var reader = new ReverseTextReader(path);
while (!reader.EndOfStream)
{
Console.WriteLine(reader.ReadLine());
}
Implementation:
public class ReverseTextReader
{
private const int LineFeedLf = 10;
private const int LineFeedCr = 13;
private readonly Stream _stream;
private readonly Encoding _encoding;
public bool EndOfStream => _stream.Position == 0;
public ReverseTextReader(Stream stream, Encoding encoding)
{
_stream = stream;
_encoding = encoding;
_stream.Position = _stream.Length;
}
public string ReadLine()
{
if (_stream.Position == 0) return null;
var line = new List<byte>();
var endOfLine = false;
while (!endOfLine)
{
var b = _stream.ReadByteFromBehind();
if (b == -1 || b == LineFeedLf)
{
endOfLine = true;
}
line.Add(Convert.ToByte(b));
}
line.Reverse();
return _encoding.GetString(line.ToArray());
}
}
public static class StreamExtensions
{
public static int ReadByteFromBehind(this Stream stream)
{
if (stream.Position == 0) return -1;
stream.Position = stream.Position - 1;
var value = stream.ReadByte();
stream.Position = stream.Position - 1;
return value;
}
}
I put the file into a list line by line, then used List.Reverse();
StreamReader objReader = new StreamReader(filename);
string sLine = "";
ArrayList arrText = new ArrayList();
while (sLine != null)
{
sLine = objReader.ReadLine();
if (sLine != null)
arrText.Add(sLine);
}
objReader.Close();
arrText.Reverse();
foreach (string sOutput in arrText)
{
...
You can read the file one character at a time backwards and cache all characters until you reach a carriage return and/or line feed.
You then reverse the collected string and yeld it as a line.
There are good answers here already, and here's another LINQ-compatible class you can use which focuses on performance and support for large files. It assumes a "\r\n" line terminator.
Usage:
var reader = new ReverseTextReader(#"C:\Temp\ReverseTest.txt");
while (!reader.EndOfStream)
Console.WriteLine(reader.ReadLine());
ReverseTextReader Class:
/// <summary>
/// Reads a text file backwards, line-by-line.
/// </summary>
/// <remarks>This class uses file seeking to read a text file of any size in reverse order. This
/// is useful for needs such as reading a log file newest-entries first.</remarks>
public sealed class ReverseTextReader : IEnumerable<string>
{
private const int BufferSize = 16384; // The number of bytes read from the uderlying stream.
private readonly Stream _stream; // Stores the stream feeding data into this reader
private readonly Encoding _encoding; // Stores the encoding used to process the file
private byte[] _leftoverBuffer; // Stores the leftover partial line after processing a buffer
private readonly Queue<string> _lines; // Stores the lines parsed from the buffer
#region Constructors
/// <summary>
/// Creates a reader for the specified file.
/// </summary>
/// <param name="filePath"></param>
public ReverseTextReader(string filePath)
: this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), Encoding.Default)
{ }
/// <summary>
/// Creates a reader using the specified stream.
/// </summary>
/// <param name="stream"></param>
public ReverseTextReader(Stream stream)
: this(stream, Encoding.Default)
{ }
/// <summary>
/// Creates a reader using the specified path and encoding.
/// </summary>
/// <param name="filePath"></param>
/// <param name="encoding"></param>
public ReverseTextReader(string filePath, Encoding encoding)
: this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), encoding)
{ }
/// <summary>
/// Creates a reader using the specified stream and encoding.
/// </summary>
/// <param name="stream"></param>
/// <param name="encoding"></param>
public ReverseTextReader(Stream stream, Encoding encoding)
{
_stream = stream;
_encoding = encoding;
_lines = new Queue<string>(128);
// The stream needs to support seeking for this to work
if(!_stream.CanSeek)
throw new InvalidOperationException("The specified stream needs to support seeking to be read backwards.");
if (!_stream.CanRead)
throw new InvalidOperationException("The specified stream needs to support reading to be read backwards.");
// Set the current position to the end of the file
_stream.Position = _stream.Length;
_leftoverBuffer = new byte[0];
}
#endregion
#region Overrides
/// <summary>
/// Reads the next previous line from the underlying stream.
/// </summary>
/// <returns></returns>
public string ReadLine()
{
// Are there lines left to read? If so, return the next one
if (_lines.Count != 0) return _lines.Dequeue();
// Are we at the beginning of the stream? If so, we're done
if (_stream.Position == 0) return null;
#region Read and Process the Next Chunk
// Remember the current position
var currentPosition = _stream.Position;
var newPosition = currentPosition - BufferSize;
// Are we before the beginning of the stream?
if (newPosition < 0) newPosition = 0;
// Calculate the buffer size to read
var count = (int)(currentPosition - newPosition);
// Set the new position
_stream.Position = newPosition;
// Make a new buffer but append the previous leftovers
var buffer = new byte[count + _leftoverBuffer.Length];
// Read the next buffer
_stream.Read(buffer, 0, count);
// Move the position of the stream back
_stream.Position = newPosition;
// And copy in the leftovers from the last buffer
if (_leftoverBuffer.Length != 0)
Array.Copy(_leftoverBuffer, 0, buffer, count, _leftoverBuffer.Length);
// Look for CrLf delimiters
var end = buffer.Length - 1;
var start = buffer.Length - 2;
// Search backwards for a line feed
while (start >= 0)
{
// Is it a line feed?
if (buffer[start] == 10)
{
// Yes. Extract a line and queue it (but exclude the \r\n)
_lines.Enqueue(_encoding.GetString(buffer, start + 1, end - start - 2));
// And reset the end
end = start;
}
// Move to the previous character
start--;
}
// What's left over is a portion of a line. Save it for later.
_leftoverBuffer = new byte[end + 1];
Array.Copy(buffer, 0, _leftoverBuffer, 0, end + 1);
// Are we at the beginning of the stream?
if (_stream.Position == 0)
// Yes. Add the last line.
_lines.Enqueue(_encoding.GetString(_leftoverBuffer, 0, end - 1));
#endregion
// If we have something in the queue, return it
return _lines.Count == 0 ? null : _lines.Dequeue();
}
#endregion
#region IEnumerator<string> Interface
public IEnumerator<string> GetEnumerator()
{
string line;
// So long as the next line isn't null...
while ((line = ReadLine()) != null)
// Read and return it.
yield return line;
}
IEnumerator IEnumerable.GetEnumerator()
{
throw new NotImplementedException();
}
#endregion
}
I know this post is very old but as I couldn't find how to use the most voted solution, I finally found this:
here is the best answer I found with a low memory cost in VB and C#
http://www.blakepell.com/2010-11-29-backward-file-reader-vb-csharp-source
Hope, I'll help others with that because it tooks me hours to finally find this post!
[Edit]
Here is the c# code :
//*********************************************************************************************************************************
//
// Class: BackwardReader
// Initial Date: 11/29/2010
// Last Modified: 11/29/2010
// Programmer(s): Original C# Source - the_real_herminator
// http://social.msdn.microsoft.com/forums/en-US/csharpgeneral/thread/9acdde1a-03cd-4018-9f87-6e201d8f5d09
// VB Converstion - Blake Pell
//
//*********************************************************************************************************************************
using System.Text;
using System.IO;
public class BackwardReader
{
private string path;
private FileStream fs = null;
public BackwardReader(string path)
{
this.path = path;
fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
fs.Seek(0, SeekOrigin.End);
}
public string Readline()
{
byte[] line;
byte[] text = new byte[1];
long position = 0;
int count;
fs.Seek(0, SeekOrigin.Current);
position = fs.Position;
//do we have trailing rn?
if (fs.Length > 1)
{
byte[] vagnretur = new byte[2];
fs.Seek(-2, SeekOrigin.Current);
fs.Read(vagnretur, 0, 2);
if (ASCIIEncoding.ASCII.GetString(vagnretur).Equals("rn"))
{
//move it back
fs.Seek(-2, SeekOrigin.Current);
position = fs.Position;
}
}
while (fs.Position > 0)
{
text.Initialize();
//read one char
fs.Read(text, 0, 1);
string asciiText = ASCIIEncoding.ASCII.GetString(text);
//moveback to the charachter before
fs.Seek(-2, SeekOrigin.Current);
if (asciiText.Equals("n"))
{
fs.Read(text, 0, 1);
asciiText = ASCIIEncoding.ASCII.GetString(text);
if (asciiText.Equals("r"))
{
fs.Seek(1, SeekOrigin.Current);
break;
}
}
}
count = int.Parse((position - fs.Position).ToString());
line = new byte[count];
fs.Read(line, 0, count);
fs.Seek(-count, SeekOrigin.Current);
return ASCIIEncoding.ASCII.GetString(line);
}
public bool SOF
{
get
{
return fs.Position == 0;
}
}
public void Close()
{
fs.Close();
}
}
I wanted to do the similar thing.
Here is my code. This class will create temporary files containing chunks of the big file. This will avoid memory bloating. User can specify whether s/he wants the file reversed. Accordingly it will return the content in reverse manner.
This class can also be used to write big data in a single file without bloating memory.
Please provide feedback.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace BigFileService
{
public class BigFileDumper
{
/// <summary>
/// Buffer that will store the lines until it is full.
/// Then it will dump it to temp files.
/// </summary>
public int CHUNK_SIZE = 1000;
public bool ReverseIt { get; set; }
public long TotalLineCount { get { return totalLineCount; } }
private long totalLineCount;
private int BufferCount = 0;
private StreamWriter Writer;
/// <summary>
/// List of files that would store the chunks.
/// </summary>
private List<string> LstTempFiles;
private string ParentDirectory;
private char[] trimchars = { '/', '\\'};
public BigFileDumper(string FolderPathToWrite)
{
this.LstTempFiles = new List<string>();
this.ParentDirectory = FolderPathToWrite.TrimEnd(trimchars) + "\\" + "BIG_FILE_DUMP";
this.totalLineCount = 0;
this.BufferCount = 0;
this.Initialize();
}
private void Initialize()
{
// Delete existing directory.
if (Directory.Exists(this.ParentDirectory))
{
Directory.Delete(this.ParentDirectory, true);
}
// Create a new directory.
Directory.CreateDirectory(this.ParentDirectory);
}
public void WriteLine(string line)
{
if (this.BufferCount == 0)
{
string newFile = "DumpFile_" + LstTempFiles.Count();
LstTempFiles.Add(newFile);
Writer = new StreamWriter(this.ParentDirectory + "\\" + newFile);
}
// Keep on adding in the buffer as long as size is okay.
if (this.BufferCount < this.CHUNK_SIZE)
{
this.totalLineCount++; // main count
this.BufferCount++; // Chunk count.
Writer.WriteLine(line);
}
else
{
// Buffer is full, time to create a new file.
// Close the existing file first.
Writer.Close();
// Make buffer count 0 again.
this.BufferCount = 0;
this.WriteLine(line);
}
}
public void Close()
{
if (Writer != null)
Writer.Close();
}
public string GetFullFile()
{
if (LstTempFiles.Count <= 0)
{
Debug.Assert(false, "There are no files created.");
return "";
}
string returnFilename = this.ParentDirectory + "\\" + "FullFile";
if (File.Exists(returnFilename) == false)
{
// Create a consolidated file from the existing small dump files.
// Now this is interesting. We will open the small dump files one by one.
// Depending on whether the user require inverted file, we will read them in descending order & reverted,
// or ascending order in normal way.
if (this.ReverseIt)
this.LstTempFiles.Reverse();
foreach (var fileName in LstTempFiles)
{
string fullFileName = this.ParentDirectory + "\\" + fileName;
// FileLines will use small memory depending on size of CHUNK. User has control.
var fileLines = File.ReadAllLines(fullFileName);
// Time to write in the writer.
if (this.ReverseIt)
fileLines = fileLines.Reverse().ToArray();
// Write the lines
File.AppendAllLines(returnFilename, fileLines);
}
}
return returnFilename;
}
}
}
This service can be used as follows -
void TestBigFileDump_File(string BIG_FILE, string FOLDER_PATH_FOR_CHUNK_FILES)
{
// Start processing the input Big file.
StreamReader reader = new StreamReader(BIG_FILE);
// Create a dump file class object to handle efficient memory management.
var bigFileDumper = new BigFileDumper(FOLDER_PATH_FOR_CHUNK_FILES);
// Set to reverse the output file.
bigFileDumper.ReverseIt = true;
bigFileDumper.CHUNK_SIZE = 100; // How much at a time to keep in RAM before dumping to local file.
while (reader.EndOfStream == false)
{
string line = reader.ReadLine();
bigFileDumper.WriteLine(line);
}
bigFileDumper.Close();
reader.Close();
// Get back full reversed file.
var reversedFilename = bigFileDumper.GetFullFile();
Console.WriteLine("Check output file - " + reversedFilename);
}
In case anyone else comes across this, I solved it with the following PowerShell script which can easily be modified into a C# script with a small amount of effort.
[System.IO.FileStream]$fileStream = [System.IO.File]::Open("C:\Name_of_very_large_file.log", [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite)
[System.IO.BufferedStream]$bs = New-Object System.IO.BufferedStream $fileStream;
[System.IO.StreamReader]$sr = New-Object System.IO.StreamReader $bs;
$buff = New-Object char[] 20;
$seek = $bs.Seek($fileStream.Length - 10000, [System.IO.SeekOrigin]::Begin);
while(($line = $sr.ReadLine()) -ne $null)
{
$line;
}
This basically starts reading from the last 10,000 characters of a file, outputting each line.