Data buffering from a file. code not working as intended - c#

I am not very experienced in C#, but have lots of experience from other languages.
I am doing a project in C# where I have to read and modify large files.
For this I have coded a buffering scheme where I keep chunks of data in memory, and swap them to disk when I need to read more. I always eliminate the [0] element from the array, by moving the following elements back one position.
public struct TBBuffer
{
public long offset;
public short[] data;
public GCHandle dataHandle;
}
//tb is a TBBuffer[], the data[] is initialized to 4096.
If I use a small sample file, where everything fits in the buffers allocated, everything works as intended.
Whenever I need to free up some memory for more data I do:
int bufIdx,bufNo;
for (bufIdx = 0; bufIdx < tb.buffer.Length - 1; bufIdx++)
{
tb.buffer[bufIdx] = tb.buffer[bufIdx + 1];
}
bufNo = tb.Length - 1;
I have determined that the above code is the source of the problem, but I am unable to find out why that is so.
So my question is: Considering the TBBuffer struct, and its contents, does anybody have a clue why this is not working as expected ?
Is there a more efficient way to do this.

Are you looking for array resize?
Array.Resize(ref tb.buffer, newSize);
Just show up your intention to .Net and let it do the work (in the efficient way) for you.

in for loop you use tb.Length. I think it should be tb.buffer.Length:
int bufIdx,bufNo;
for (bufIdx = 0; bufIdx < tb.buffer.Length - 1; bufIdx++)
{
tb.buffer[bufIdx] = tb.buffer[bufIdx + 1];
}
bufNo = tb.Length - 1;

I solved my problem... Rather embarrasing, but the problem was that the data[] of the very last buffer would point at the same as the next-to-last after I did my for () move stunt. After adding a tb.buffer[bufNo].data=new short[4096]; statement everything is back to normal. Thank you for your time, and all the constructive feedback. I will be looking into memory mapped files to see whether this feature will be a better option.

Related

Using pointers , memory gets overwritten , how to "reserve" memory?

I am trying to create a class that simulates an array and makes use of pointers. Everything works well until my pointer values are overwritten.
Here's a sample of my code , this is the indexer that I use to get / set the values. As I said everything works well until at some point the address is overwritten with some other random values.
How can I fix or "reserve" the space for the length of the "array"?
public int this[int x]
{
get
{
if(x >= _length || x < 0)
{
throw new IndexOutOfRangeException();
}
int* offsetToReturn = indexZeroPointer + x;
return *offsetToReturn;
}
set
{
if (x >= _length || x < 0)
{
throw new IndexOutOfRangeException();
}
int* offset = indexZeroPointer + x;
*offset = value;
}
}
I used as index 0 for the array the address of a random integer I declared in class.
indexZeroPointer = &someValue;
You can't just randomly take addresses of objects in C#. You're in a managed memory environment where the locations in virtual memory space can (and will) change.
By taking the address of a single integer, you're at best getting four bytes of memory to use. You can't just access the memory behind the allocated piece and hope for the best - not only does it change (due to memory relocation), it will also be taken up by others. This is especially true if you got the address from a local, which would be allocated on the stack - you're rewriting the stack willy-nilly.
If you want to use pointers (relatively) safely, you need to ensure that the memory you use is actually allocated, and persisted as long as necessary. For example, if you know the length in advance, you can use this piece of code to get the "zero address":
var _length = 10;
var indexZeroPointer = (int*)Marshal.AllocHGlobal(_length * sizeof(int)).ToPointer();
This is just the very beginning of your problems, though. As soon as you enter pointer territory, you lose all the benefits of dealing with managed memory. You'll need to release memory as necessary, get rid of invalid or dangling pointers, handle all the bounds checking and many others.
This is yet another subject where just feeling your way around is going to hurt you. You really want to learn what you're doing, and how the architecture of the computer and the operating system works, and how all this integrates with the .NET memory model. As you just discovered, unsafe code has a tendency of appearing to work, while causing random issues all over the place if you don't know what you're doing (and even if you do - remember the Heartbleed bug and friends?). Make sure you understand how the lower layers work - by using unsafe code, the abstractions that help you avoid understanding that disappear. Low-level coding isn't very friendly.

System.OutOfMemoryException. Creating a big matrix

I have a matrix [3,15000]. I need to count covariance matrix for the original matrix and then find its eigenvalues.
This is a part of my code:
double[,] covarianceMatrix = new double[numberOfObjects,numberOfObjects];
for (int n=0; n<numberOfObjects;n++)
{
for (int m=0;m<numberOfObjects;m++)
{
double sum = 0;
for (int k=0; k<TimeAndRepeats[i,1]; k++)
{
sum += originalMatrix[k,n]*originalMatrix[k,m];
}
covarianceMatrix[n,m] = sum/TimeAndRepeats[i,1];
}
}
alglib.smatrixevd(covarianceMatrix,numberOfObjects,1,true,out eigenValues, out eigenVectors);
NumberOfObjects here is about 15000.
When I do my computations for a smaller number of objects everything is Ok, but for all my data I get an exeption.
Is it possible to solve this problem?
I am using macOS, x64
My environment is MonoDevelop
double[,] covarianceMatrix = new double[numberOfObjects,numberOfObjects];
You said that your matrix is [3, 15000] and that numberOfObjects is 15000. By this line of code here, you're creating a matrix of [15000, 15000] of doubles
15000 * 15000 = 225000000 doubles at 8 bytes each: 1,800,000,000 bytes or 1.8GB
That's probably why you are running out of memory.
Edit:
According to this question and this question the size of objects in C# cannot be larger that 2GB. The 1.8GB does not count any additional overhead required to reference the items in the array, so that 1.8GB might actually be > 2GB when everything is accounted for (Can't say without the debugging info, someone with more C# experience might have to set me straight on this). You might consider this workaround if you're trying to work with really large array, since statically allocated arrays can get messy.
When you create covarianceMatrix, you are creatinf an object of 15000*15000 = 225000000
so you need 1800000000 bytes of memory. it is because of that that you have OutofMemoryException
Exception name tells you exactly what's the problem. You could use floats instead of doubles to bisect ammount of memory needed. Other option would be to create some class object for a covariance matrix that would save data in a disk file, though you'd need to implement proper mechanisms to operate on it and the performance would be limited aswell.

How do I retrieve the list from said pointer?

The MSDN document that I am trying to follow is located here. Basically I am trying to figure out in C# how to read that pointer into a list of the DHCP_OPTION_DATA structures.
I have the following code but I don't think that it is the proper way to do this.
DHCP_OPTION_ARRAY optionArray = (DHCP_OPTION_ARRAY)Marshal.PtrToStructure(options, typeof(DHCP_OPTION_ARRAY));
List<DHCP_OPTION> allOptions = new List<DHCP_OPTION>();
for (int i = 0; i < optionArray.NumElements; i++) {
DHCP_OPTION option = (DHCP_OPTION)Marshal.PtrToStructure(optionArray.Options, typeof(DHCP_OPTION));
allOptions.Add(option);
optionArray.Options = (IntPtr)((int)optionArray.Options + (int)Marshal.SizeOf(option));
}
Since I can't Marshal the pointer into a generic list collection I tried this way. My problem is that I am getting skewed results based on how much I increase the IntPtr to. Initially I was doing this.
optionArray.Options = (IntPtr)((int)optionArray.Options + (int)Marshal.SizeOf(typeof(DHCP_OPTION_DATA)));
However, I then realized that the next element would be located after the size of the actual option.
So the question still remains, how do I Marshal a Ptr to a list of structures?
EDIT 1
I posted the wrong article it is fixed now.
EDIT 2
Although both answers were great, I chose the answer to my problem because it addressed my lack of understanding of how the data was being handled on the back end of marshaling the information.
Is the first option object you get correct?
If so, the reason for the rest being skewed most likely is the alignment of the structure.
You could try to find the correct alignment, for example:
var offset = (int)Marshal.SizeOf(typeof(DHCP_OPTION_DATA));
var alignment = 4;
var remainder = offset % alignment;
if(remainder != 0)
offset += alignment - remainder;
optionArray.Options = (IntPtr)((int)optionArray.Options + offset);
Here is a paper Jason Rupard wrote using the DHCP_OPTION_ARRAY...
http://www.rupj.net/portfolio/docs/dws-writeup.pdf
Looks like he has everything you need and more... :)
Although looking at it you could define the structure a little differently and have it automatically turned into an array upon deserialization if you get the Pack attribute right.

Simple round robin (moving average) array in C#

As a diagnostic, I want to display the number of cycles per second in my app. (Think frames-per-second in a first-person-shooter.)
But I don't want to display the most recent value, or the average since launch. What I want to calculate is the mean of the last X values.
My question is, I suppose, about the best way to store these values. My first thought was to create a fixed size array, so each new value would push out the oldest. Is this the best way to do it? If so, how would I implement it?
EDIT:
Here's the class I wrote: RRQueue. It inherits Queue, but enforces the capacity and dequeues if necessary.
EDIT 2:
Pastebin is so passé. Now on a GitHub repo.
The easiest option for this is probably to use a Queue<T>, as this provides the first-in, first-out behavior you're after. Just Enqueue() your items, and when you have more than X items, Dequeue() the extra item(s).
Possibly use a filter:
average = 0.9*average + 0.1*value
where 'value' is the most recent measurement
Vary with the 0.9 and 0.1 (as long as the sum of these two is 1)
This is not exactly an average, but it does filter out spikes, transients, etc, but does not require arrays for storage.
Greetings,
Karel
If you need the fastest implementation, then yes, a fixed-size array ()with a separate count would be fastest.
You should take a look at the performance monitoring built into Windows :D.
MSDN
The API will feel a bit wonky if you haven't played with it before, but it's fast, powerful, extensible, and it makes quick work of getting usable results.
my implementation:
class RoundRobinAverage
{
int[] buffer;
byte _size;
byte _idx = 0;
public RoundRobinAverage(byte size)
{
_size = size;
buffer = new int[size];
}
public double Calc(int probeValue)
{
buffer[_idx++] = probeValue;
if (_idx >= _size)
_idx = 0;
return buffer.Sum() / _size;
}
}
usage:
private RoundRobinAverage avg = new RoundRobinAverage(10);\
...
var average = avg.Calc(123);

C# byte[] substring? (design)

I'm downloading some files asynchronously into a large byte array, and I have a callback that fires off periodically whenever some data is added to that array. If I want to give developers the ability to use the last chunk of data that was added to array, then... well how would I do that? In C++ I could give them a pointer to somewhere in the middle, and then perhaps tell them the number of bytes that were added in the last operation so they at least know the chunk they should be looking at... I don't really want to give them a 2nd copy of that data, that's just wasteful.
I'm just thinking if people want to process this data before the file has completed downloading. Would anyone actually want to do that? Or is it a useless feature anyway? I already have a callback for when the buffer (entire byte array) is full, and then they can dump the whole thing without worrying about start and end points...
.NET has a struct that does exactly what you want:
System.ArraySegment.
In any case, it's easy to implement it yourself too - just make a constructor that takes a base array, an offset, and a length. Then implement an indexer that offsets indexes behind the scenes, so your ArraySegment can be seamlessly used in the place of an array.
You can't give them a pointer into the array, but you could give them the array and start index and length of the new data.
But I have to wonder what someone would use this for. Is this a known need? or are you just guessing that someone might want this someday. And If so, is there any reason why you couldn't wait to add the capability once somone actually needs it?
Whether this is needed or not depends on whether you can afford to accumulate all the data from a file before processing it, or whether you need to provide a streaming mode where you process each chunk as it arrives. This depends on two things: how much data there is (you probably would not want to accumulate a multi-gigabyte file), and how long it takes the file to completely arrive (if you are getting the data over a slow link you might not want your client to wait till it had all arrived). So it is a reasonable feature to add, depending on how the library is to be used. Streaming mode is usually a desirable attribute, so I would vote for implementing the feature. However, the idea of putting the data into an array seems wrong, because it fundamentally implies a non-streaming design, and because it requires an additional copy. What you could do instead is to keep each chunk of arriving data as a discrete piece. These could be stored in a container for which adding at the end and removing from the front is efficient.
Copying a chunk of a byte array may seem "wasteful," but then again, object-oriented languages like C# tend to be a little more wasteful than procedural languages anyway. A few extra CPU cycles and a little extra memory consumption can greatly reduce complexity and increase flexibility in the development process. In fact, copying bytes to a new location in memory to me sounds like good design, as opposed to the pointer approach which will give other classes access to private data.
But if you do want to use pointers, C# does support them. Here is a decent-looking tutorial. The author is correct when he states, "...pointers are only really needed in C# where execution speed is highly important."
I agree with the OP: sometimes you just plain need to pay some attention to efficiency. I don't think the example of providing an API is the best, because that certainly calls for leaning toward safety and simplicity over efficiency.
However, a simple example is when processing large numbers of huge binary files that have zillions of records in them, such as when writing a parser. Without using a mechanism such as System.ArraySegment, the parser becomes a big memory hog, and is greatly slowed down by creating a zillion new data elements, copying all the memory over, and fragmenting the heck out of the heap. It's a very real performance issue. I write these kinds of parsers all the time for telecommunications stuff which generate millions of records per day in each of several categories from each of many switches with variable length binary structures that need to be parsed into databases.
Using the System.ArraySegment mechanism versus creating new structure copies for each record tremendously speeds up the parsing, and greatly reduces the peak memory consumption of the parser. These are very real advantages because the servers run multiple parsers, run them frequently, and speed and memory conservation = very real cost savings in not having to have so many processors dedicated to the parsing.
System.Array segment is very easy to use. Here's a simple example of providing a base way to track the individual records in a typical big binary file full of records with a fixed length header and a variable length record size (obvious exception control deleted):
public struct MyRecord
{
ArraySegment<byte> header;
ArraySegment<byte> data;
}
public class Parser
{
const int HEADER_SIZE = 10;
const int HDR_OFS_REC_TYPE = 0;
const int HDR_OFS_REC_LEN = 4;
byte[] m_fileData;
List<MyRecord> records = new List<MyRecord>();
bool Parse(FileStream fs)
{
int fileLen = (int)fs.FileLength;
m_fileData = new byte[fileLen];
fs.Read(m_fileData, 0, fileLen);
fs.Close();
fs.Dispose();
int offset = 0;
while (offset + HEADER_SIZE < fileLen)
{
int recType = (int)m_fileData[offset];
switch (recType) { /*puke if not a recognized type*/ }
int varDataLen = ((int)m_fileData[offset + HDR_OFS_REC_LEN]) * 256
+ (int)m_fileData[offset + HDR_OFS_REC_LEN + 1];
if (offset + varDataLen > fileLen) { /*puke as file has odd bytes at end*/}
MyRecord rec = new MyRecord();
rec.header = new ArraySegment(m_fileData, offset, HEADER_SIZE);
rec.data = new ArraySegment(m_fileData, offset + HEADER_SIZE,
varDataLen);
records.Add(rec);
offset += HEADER_SIZE + varDataLen;
}
}
}
The above example gives you a list with ArraySegments for each record in the file while leaving all the actual data in place in one big array per file. The only overhead are the two array segments in the MyRecord struct per record. When processing the records, you have the MyRecord.header.Array and MyRecord.data.Array properties which allow you to operate on the elements in each record as if they were their own byte[] copies.
I think you shouldn't bother.
Why on earth would anyone want to use it?
That sounds like you want an event.
public class ArrayChangedEventArgs : EventArgs {
public (byte[] array, int start, int length) {
Array = array;
Start = start;
Length = length;
}
public byte[] Array { get; private set; }
public int Start { get; private set; }
public int Length { get; private set; }
}
// ...
// and in your class:
public event EventHandler<ArrayChangedEventArgs> ArrayChanged;
protected virtual void OnArrayChanged(ArrayChangedEventArgs e)
{
// using a temporary variable avoids a common potential multithreading issue
// where the multicast delegate changes midstream.
// Best practice is to grab a copy first, then test for null
EventHandler<ArrayChangedEventArgs> handler = ArrayChanged;
if (handler != null)
{
handler(this, e);
}
}
// finally, your code that downloads a chunk just needs to call OnArrayChanged()
// with the appropriate args
Clients hook into the event and get called when things change. This is what most client code in .NET expects to have in an API ("call me when something happens"). They can hook into the code with something as simple as:
yourDownloader.ArrayChanged += (sender, e) =>
Console.WriteLine(String.Format("Just downloaded {0} byte{1} at position {2}.",
e.Length, e.Length == 1 ? "" : "s", e.Start));

Categories