I have some code in a thread. The code's main function is to call other methods, which write stuff to a SQL database like this:
private void doWriteToDb()
{
while (true)
{
try
{
if (q.Count == 0) qFlag.WaitOne();
PFDbItem dbItem = null;
lock (qLock)
{
dbItem = q.Dequeue();
}
if (dbItem != null)
{
System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
//write it off
PFResult result = dbItem.Result;
double frequency = dbItem.Frequency;
int i = dbItem.InChannel;
int j = dbItem.OutChannel;
long detail, param, reading, res;
detail = PFCoreMethods.AddNewTestDetail(DbTables, _dicTestHeaders[result.Name.ToString().ToLower()]);
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Frequency");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, frequency.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "In channel");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, i.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Out channel");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, j.ToString());
param = PFCoreMethods.AddNewTestParameter(DbTables, detail, "Spec");
PFCoreMethods.AddNewTestParameterValue(DbTables, param, result.Spec);
dbItem.Dispose();
dqcnt++;
sw.Stop();
}
}
catch (Exception ex)
{
}
}
}
The AddNewTestParameter method is using a 3rd party class which has the SQL code. Currently I have no access to its internals.
DbTables is a collection object, whose properties are table objectscreated by the 3rd party program. There is only one DbTable object which the program uses.
The problem is that as time passes (couple of hours) the AddNewTestParameter method call takes longer and longer, starting from about 10ms to about 1sec.
The q, is a queue with objects that contain the necessary information to write into the database. The items are added to this queue by the main thread. THe thread simply takes them out, writes them, and disposes of them. The q.Count is no more than 1, although in time as the database writes become slower, the q.Count rises since dequeueing cannot catch up. At its worst, the q.Count was over 30,000. I write over 150,000 entries to the database in total.
On the SQL end, I ran some traces on the server, and the trace shows that internally SQL always takes about 10ms, even during the time the C# code itself takes 1sec.
So, currently, I have 2 suspicions:
My code is the problem. The thread is low-priority, perhaps this might affect performance. Also, after watching the memory usage for 20 minutes, I see that it rises at about 100K/min, CPU usage seems constant around %2-5. How can I figure out where the memory leak happens? Can I pinpoint it to a specific part of the code?
The 3rd party code is the problem. How could I go about proving this? What methods are there to watch and confirm that the problem lies in 3rd party code?
Anyway, if I had to make a suggestion I would look at DBTables ... if that's a collection maybe you're forgetting to reset it so everytime you call it it has one more element... so after a while the 3rd party routine that's O(n^2), or something like that, starts to degrade because it's expecting a worst case scenario of 20 tables and you're providing 1000.
Edit: Ok, I would discard the problem being in the queue as dequeuing should be a really fast operation (you can measure it anyway). It still points to the DBTables collection growing bigger and bigger, have you check its size after the first x iterations?
Edit2: Ok, another approach, let's say the AddNewTestParameter does exactly what is says it does... ADD a new parameter that then gets added to an internal collection. Now, there are two options, if that's the case, either you're supposed to clear that collection by calling the "ClearParameters" function after each iteration and then it will be your fault, or you have not such functionality and then it's 3rd code fault. That would explain also your memory loses (altough that can also be related to the growing queue)
Related
I am trying to build some objects and insert them into a database. The number of records that have to be inserted is big ~ millions.
The insert is done in batches.
The problem I am having is that i need to initialize new objects to add them to a list and at the end, i do a bulk insert into the database of the list. Because i am initializing a huge number of objects, my computer memory(RAM) gets filled up and it kinda freezes everything.
The question is :
From a memory point of view, should I initialize objects of set them to null ?
Also, I am trying to work with the same object reference. Am i doing it right ?
Code:
QACompleted completed = new QACompleted();
QAUncompleted uncompleted = new QAUncompleted();
QAText replaced = new QAText();
foreach (QAText question in questions)
{
MatchCollection matchesQ = rgx.Matches(question.Question);
MatchCollection matchesA = rgx.Matches(question.Answer);
foreach (GetKeyValues_Result item in values)
{
hasNull = false;
replaced = new QAText(); <- this object
if (matchesQ.Count > 0)
{
SetQuestion(matchesQ, replaced, question, item);
}
else
{
replaced.Question = question.Question;
}
if (matchesA.Count > 0)
{
SetAnswer(matchesA,replaced,question,item);
}
else
{
replaced.Answer = question.Answer;
}
if (!hasNull)
{
if (matchesA.Count == 0 && matchesQ.Count == 0)
{
completed = new QACompleted(); <- this object
MapEmpty(replaced,completed, question.Id);
}
else
{
completed = new QACompleted(); <- this object
MapCompleted(replaced, completed, question.Id, item);
}
goodResults.Add(completed);
}
else
{
uncompleted = new QAUncompleted(); <- this object
MapUncompleted(replaced,uncompleted,item, question.Id);
badResults.Add(uncompleted);
}
}
var success = InsertIntoDataBase(goodResults, "QACompleted");
var success1 = InsertIntoDataBase(badResults, "QAUncompleted");
}
I have marked the objects. Should I just call them like replaced = NULL, or should i use the constructor ?
What would be the difference between new QAText() and = null ?
The memory cost of creating objects
Creating objects in C# will always have a memory cost. This relates to the memory layout of object. Assuming you are using 64 bit OS, the runtime has to allocate an extra 8 bytes for sync block, and 8 bytes for method table pointer. After the sync block and method table pointer are your customized data fields. Besides the inevitable 16 bytes header, objects are always aligned to the boundary of 8 bytes and therefore can incur extra overhead.
You can roughly estimate the memory overhead if you know exactly what is the number of objects you create. However I would suggest you be careful when assuming that your memory pressure is coming from object layout overhead. This is also the reason I suggest you estimate the overhead as the first step. You might end up realizing that even if the layout overhead can magically be completely removed, you are not going to make a huge difference in terms of memory performance. After all, for a million objects, the overhead of object header is only 16 MB.
The difference between replaced = new QAText() and replaced = null
I suppose after you set replaced to null you still have to create another QAText()? If so, memory-wise there is no real difference to the garbage collector. The old QAText instance will be collected either way if you are not making any other reference to it. When to collect the instance, however, is the call of garbage collector. Doing replaced = null will not make the GC happen earlier.
You can try to reuse the same QAText instance instead of creating a new one every time. But creating a new one every time will not result in high memory pressure. It will make the GC a little busier therefore result in a higher CPU usage.
Identify the real cause for high memory usage
If your application is really using a lot of memory, you have to look at the design of your QACompleted and QAUncompleted objects. Those are the objects added to the list and occupy memory until you submit them to the database. If those objects are designed well(they are only taking the memory they have to take), as Peter pointed out you should use a smaller batch size so you don't have to keep too many of them in memory.
There are other factors in your program that can possible cause unexpected memory usage. What is the data structure for goodResults and badResults? Are they List or LinkedList? List internally is nothing but a dynamic array. It uses a grow policy which will always double its size when it is full. The always-double policy can eat up memory quickly especially when you have a lot of entries.
LinkedList, on the other side, does not suffer from the above-mentioned problem. But every single node requires roughly 40 extra bytes.
It also worth-checking what MapCompleted and MapUnCompleted methods are doing. Are they making long-lived reference to replaced object? If so it will cause a memory leak.
As a summary, when dealing with memory problems, you should focus on macro-scope issues such as the choice of data structures, or memory leaks. Or optimize your algorithms so that you don't have to keep all the data in memory all the time.
Instantiating new (albeit empty) object always takes some memory, as it has to allocate space for the object's fields. If you aren't going to access or set any data in the instance, I see no point in creating it.
It's unfortunate that the code example is not written better. There seem to be lots of declarations left out, and undocumented side-effects in the code. This makes it very hard to offer specific advice.
That said…
Your replaced object does not appear to be retained beyond one iteration of the loop, so it's not part of the problem. The completed and uncompleted objects are added to lists, so they do add to your memory consumption. Likewise the goodResults and badResults lists themselves (where are the declarations for those?).
If you are using a computer with too little RAM, then yes...you'll run into performance issues as Windows uses the disk to make up for the lack of RAM. And even with enough RAM, at some point you could run into .NET's limitations with respect to object size (i.e. you can only put so many elements into a list). So one way or the other, you seem to need to reduce your peak memory usage.
You stated that when the data in the lists is inserted into the database, the lists are cleared. So presumably that means that there are so many elements in the values list (one of the undeclared, undocumented variables in your code example) that the lists and their objects get too large before getting to the end of the inner loop and inserting the data into the database.
In that case, then it seems likely the simplest way to address the issue is to submit the updates in batches inside the inner foreach loop. E.g. at the end of that loop, add something like this:
if (goodResults.Count >= 100000)
{
var success = InsertIntoDataBase(goodResults, "QACompleted");
}
if (badResults.Count >= 100000)
{
var success = InsertIntoDataBase(badResults, "QACompleted");
}
(Declaring the actual cut-off as a named constant of course, and handling the database insert result return value as appropriate).
Of course, you would still do the insert at the end of the outer loop too.
Here is my sample code that I am using to fetch data from database:
on DAO layer:
public IEnumerable<IDataRecord> GetDATA(ICommonSearchCriteriaDto commonSearchCriteriaDto)
{
using(DbContext)
{
DbDataReader reader = DbContext.GetReader("ABC_PACKAGE.GET_DATA", oracleParams.ToArray(), CommandType.StoredProcedure);
while (reader.Read())
{
yield return reader;
}
}
}
On BO layer I am calling the above method like:
List<IGridDataDto> GridDataDtos = MapMultiple(_costDriversGraphDao.GetGraphData(commonSearchCriteriaDto)).ToList();
on mapper layer MapMultiple method is defined like:
public IGridDataDto MapSingle(IDataRecord dataRecord)
{
return new GridDataDto
{
Code = Convert.ToString(dataRecord["Code"]),
Name = Convert.ToString(dataRecord["Name"]),
Type = Convert.ToString(dataRecord["Type"])
};
}
public IEnumerable<IGridDataDto> MapMultiple(IEnumerable<IDataRecord> dataRecords)
{
return dataRecords.Select(MapSingle);
}
The above code is working well and good but I am wondering about two concerns with the above code.
How long data reader’s connection will be opened?
When I consider code performance factor only, Is this a good idea to use ‘yield return’ instead of adding record into a list and returning the whole list?
your code doesn't show where you open/close the connection; but the reader here will actually only be open while you are iterating the data. Deferred execution, etc. The only bit of your code that does this is the .ToList(), so it'll be fine. In the more general case, yes: the reader will be open for the amount of time you take to iterate it; if you do a .ToList() that will be minimal; if you do a foreach and (for every item) make an external http request and wait 20 seconds, then yes - it will be open for longer.
Both have their uses; the non-buffered approach is great for huge results that you want to process as a stream, without ever having to load them into a single in-memory list (or even have all of them in memory at a time); returning a list keeps the connection closed quickly, and makes it easy to avoid accidentally using the connection while it already has an open reader, but is not ideal for large results
If you return an iterator block, the caller can decide what is sane; if you always return a list, they don't have much option. A third way (that we do in dapper) is to make the choice theirs; we have an optional bool parameter which defaults to "return a list", but which the caller can change to indicate "return an iterator block"; basically:
bool buffered = true
in the parameters, and:
var data = QueryInternal<T>(...blah...);
return buffered ? data.ToList() : data;
in the implementation. In most cases, returning a list is perfectly reasonable and avoids a lot of problems, hence we make that the default.
How long data reader’s connection will be opened?
The connection will remain open until the reader is dismissed, which means that it would be open until the iteration is over.
When I consider code performance factor only, Is this a good idea to use yield return instead of adding record into a list and returning the whole list?
This depends on several factors:
If you are not planning to fetch the entire result, yield return will help you save on the amount of data transferred on the network
If you are not planning to convert returned data to objects, or if multiple rows are used to create a single object, yield return will help you save on the memory used at the peak usage point of your program
If you plan to iterate the enture result set over a short period of time, there will be no performance penalties for using yield return. If the iteration is going to last for a significant amount of time on multiple concurrent threads, the number of open cursors on the RDBMS side may become exceeded.
This answer ignores flaws in the shown implementation and covers the general idea.
It is a tradeoff - it is impossible to tell whether it is a good idea without knowing the constraints of your system - what is the amount of data you expect to get, the memory consumption you are willing to accept, expected load on the database, etc
I have tried to implement the following algorithm using Parallel.Foreach. I thought it would be trivial to make parallel, since it has no synchronization issues. It is basically a Monte-Carlo tree search, where I explore every child in a parallel. The Monte-Carlo stuff is not really important, all you have to know is that I have a method which works a some tree, and which I call with Parallel.Foreach on the root children. Here is the snippet where the parallel call is being made.
public void ExpandParallel(int time, Func<TGame, TGame> gameFactory)
{
int start = Environment.TickCount;
// Creating all of root's children
while (root.AvailablePlays.Count > 0)
Expand(root, gameInstance);
// Create the children games
var games = root.Children.Select(c =>
{
var g = gameFactory(gameInstance);
c.Play.Apply(g.Board);
return g;
}).ToArray();
// Create a task to expand each child
Parallel.ForEach(root.Children, (tree, state, i) =>
{
var game = games[i];
// Make sure we don't waste time
while (Environment.TickCount - start < time && !tree.Completed)
Expand(tree, game);
});
// Update (reset) the root data
root.Wins = root.Children.Sum(c => c.Wins);
root.Plays = root.Children.Sum(c => c.Plays);
root.TotalPayoff = root.Children.Sum(c => c.TotalPayoff);
}
The Func<TGame, TGame> delegate is a cloning factory, so that each child has its own clone of the game state. I can explain the internals of the Expand method if required, but I can assure that it only accesses the state of the current sub-tree and game instances and there are no static members in any of those types. I thought it may be that Environment.TickCount is making the contention, but I ran an experiment just calling EnvironmentTickCount inside a Parallel.Foreach loop, and got nearly 100 % processor usage.
I only get 45% to 50% use on a Core i5.
This is a common symptom of GC thrashing. Without knowing more about what your doing inside of the Expand method, my best guess is this would be your root-cause. It's also possible that some shared data access is also the culprit, either by calling to a remote system, or by locking access to shared resources.
Before you do anything, You need to determine the exact cause with a profiler or other tool. Don't guess as this will just waste your time, and don't wait for an answer here as without your complete program it can not be answered. As you already know from experimentation, there is nothing in the Parallel.ForEach that would cause this.
my problem: Earlier this week, I got the task to speed up a task in our program. I looked at it and immediately got the idea of using a parallel foreach loop for a function in that task.
I implemented it, went through the function (including all sub-functions) and changed the SqlConnections (and other stuff) so it'd be able to run in parallel. I started the whole thing and all went well and fast (alone that reduced the time for that task by ~45%)
Now, yesterday we wanted to try the same thing with some more data and ...I got some weird problem: Whenever the parallel function got called, it did it's work...but sometimes one of the threads would hang for at least 4 minutes (timeouts are set to one minute, for connection AND command).
If I pause the program during that, I see that only one thread is still active from that loop and it hangs on
connection.Open()
After ~4 minutes the program simply proceeds without throwing an error (aside from a message in the Output box, saying that an exception somewhere occured, but it wasn't catched by my application but somewhere in the SqlConnection/SqlCommand object).
I can kill all connections on the MSSQLServer without anything happens, also the MSSQLServer does nothing during those 4 minutes, all connections are idle.
This is the procedure that is used for sending Update/Insert/Delete statements to the database:
int i = 80;
bool itDidntWork = true;
Random random = new Random();
while (itDidntWork && i > 0)
{
try
{
using (SqlConnection connection = new SqlConnection(sqlConnectionString))
{
connection.Open();
lock (connection)
{
command.Connection = connection;
command.ExecuteNonQuery();
}
itDidntWork = false;
}
}
catch (Exception ex)
{
if (ex is SqlException && ((SqlException)ex).ErrorCode == -2146232060)
{
Thread.Sleep(random.Next(500, 5000));
}
else
{
SqlConnection.ClearAllPools();
}
Thread.Sleep(random.Next(50, 110));
i--;
if (i == 0)
{
writeError(ex);
}
}
}
just in case: on smaller databases there can occur deadlocks (err number 2146232060), so if one occurs, I've to make the colliding statements occur in different time. Works great even on small databases/small servers. If the error wasn't caused by a deadlock, chances are that the connection was faulty, so I'm cleaning all broken connections.
Similiar functions exist for executing scalars, filling datatables/datasets (yes, the application is that old) and executing storedprocedures.
And yes all of those are used in the parallel loop.
Has someone any idea what could be going on there? Or an idea on how I can find out what is going on there?
*edit about the command object:
it is given to the function, the command object is always a new object when it is given into the function.
about the lock: If I put the lock away, I get dozens and hundreds of 'connection is closed' or 'connection is already open' errors, because the Open() function just get's a connection from .NET's connection pool. The lock does work as intended.
Example code:
using(SqlCommand deleteCommand = new SqlCommand(sqlStatement))
{
ExecuteNonQuerySafely(deleteCommand); // that's the function that contains the body I posted above
}
*edit 2
I've to make a correction: It hangs on this
command.Connection = connection;
at least I guess it does, because when I pause the application, the 'step' mark thingi is green and on
command.ExecuteNonQuery();
saying that that is the statement that'll be executed next.
*edit 3
just to be sure I just started another test without any locks around the connection object...will take some minutes to get the results.
*edit 4
well, I was wrong. I removed the lock statements and...it still worked. Maybe the first time I tried it there was a reused connection or something. Thank's for pointing it out.
*edit 5
I'm getting the feeling that this occurs only with one specific call to a specific database procedure. I don't know why. C# wise there is no difference between that call and other calls see edit 6. And since it didn't execute the statement at that point (I guess. Maybe someone can correct me on that. If, in debug mode, a line is green marked (instead of yellow) it didn't execute that statement yet but waits for the statement before that line to finish, is that correct?) it's strange.
*edit 6
There were 3 command objects that were reused the whole time. They were defined above the parallel function. I don't know how bad that is/was. They were only used to call one stored procedure (each of them called a different procedure), of course with different parameters and a new connection (through the above mentioned method).
*edit 7
ok, it's really only when one specific stored procedure is called. Except that it's on the assignment of the connection object that it hangs (next line is marked green).
Trying to figure out what the cause for that is atm.
*edit 8
yay, it just happened at another command. So that was that.
*edit 9
ok. Problem solved. The 'hangs' were actually CommandTimeouts that were set to 10 minutes(!). They were only set for two commands (the one I mentioned in edit 7 and the one that I mentioned in edit 8). Since I found both of them while I was restructuring my commands to make them like devundef suggested, I marked his answer as the one that solved my problem. Also his suggestion of limiting the amounts of threads my for-loop was using sped up the process even more.
Special thank's to Marc Gravell for explaining stuff and hanging in here with me on a saturday ;)
I think the problem can be found in your edit 6: edit 6: ...3 command objects were reused the whole time.
Any data that's used inside a parallel loop must be created inside the loop or it must have the proper synchronization code in place to ensure only 1 thread at a time has access to that particular object. I don't see such code inside the ExecuteNonQuerySafely.
a) Locking the connection has no effect there because the connection object is created inside the method.
b) Locking the command will not guarantee thread safety - probably you're setting the command parameters before locking it inside the method. A lock(command) will work if you lock the command before call the ExecuteNonQuerySafely, however locks inside a parallel loop is not a good thing to do - it's the definition of anti-parallel afterall, better avoid this altogether and create a new command for each iteration. Better yet would be to do a little refactoring on ExecuteNonQuerySafely, it could accept a callback action instead of an SqlCommand. Example:
public void ExecuteCommandSafely(Action<SqlCommand> callback) {
... do init stuff ...
using (var connection = new SqlConnection(...)) {
using (var command = new SqlCommand() {
command.Connection = connection;
try{
callback(command);
}
... error handling stuff ...
}
}
}
And use :
ExecuteCommandSafely((command) => {
command.CommandText = "...";
... set parameters ..
command.ExecuteNonQuery();
});
Last, the fact that you're getting errors executing the commands in parallel is a sign that maybe parallel execution is not a good thing to do in this case. You're wasting servers resources to get errors. Connections are expensive, try to use the MaxDegreeOfParalellism option to tune the workload for this particular loop (remembering that the optimal value will change according to the hardware/server/network/etc..). The Parallel.ForEach method has an overload that accepts a ParallelOptions parameters where you can set how many threads you want to execute in parallel for that par
(http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism.aspx).
EDIT: Problem wasn't related to the question. It was indeed something wrong with my code, and actually, it was so simple that I don't want to put it on the internet. Thanks anyway.
I read in roughly 550k Active directory records and store them in a List, the class being a simple wrapper for an AD user. I then split the list of ADRecords into four lists, each containing a quarter of the total. After I do this, I read in about 400k records from a database, known as EDR records, into a DataTable. I take the four quarters of my list and spawn four threads, passing each one of the four quarters. I have to match the AD records to the EDR records using email right now, but we plan to add more things to match on later.
I have a foreach on the list of AD records, and inside of that, I have to run a for loop on the EDR records to check each one, because if an AD record matches more than one EDR record, then that isn't a direct match, and should not be treated as a direct match.
My problem, by the time I get to this foreach on the list, my ADRecords list only has about 130 records in it, but right after I pull them all in, I Console.WriteLine the count, and it's 544k.
I am starting to think that even though I haven't set the list to null to be collected later, C# or Windows or something is actually taking my list away to make room for the EDR records because I haven't used the list in a while. The database that I have to use to read EDR records is a linked server, so it takes about 10 minutes to read them all in, so my list is actually idle for 10 minutes, but it's never set to null.
Any ideas?
//splitting list and passing in values to threads.
List<ADRecord> adRecords = GetAllADRecords();
for (int i = 0; i < adRecords.Count/4; i++)
{
firstQuarter.Add(adRecords[i]);
}
for (int i = adRecords.Count/4; i < adRecords.Count/2; i++)
{
secondQuarter.Add(adRecords[i]);
}
for (int i = adRecords.Count/2; i < (adRecords.Count/4)*3; i++)
{
thirdQuarter.Add(adRecords[i]);
}
for (int i = (adRecords.Count/4)*3; i < adRecords.Count; i++)
{
fourthQuarter.Add(adRecords[i]);
}
DataTable edrRecordsTable = GetAllEDRRecords();
DataRow[] edrRecords = edrRecordsTable.Select("Email_Address is not null and Email_Address <> ''", "Email_Address");
Dictionary<string, int> letterPlaces = FindLetterPlaces(edrRecords);
Thread one = new Thread(delegate() { ProcessMatches(firstQuarter, edrRecords, letterPlaces); });
Thread two = new Thread(delegate() { ProcessMatches(secondQuarter, edrRecords, letterPlaces); });
Thread three = new Thread(delegate() { ProcessMatches(thirdQuarter, edrRecords, letterPlaces); });
Thread four = new Thread(delegate() { ProcessMatches(fourthQuarter, edrRecords, letterPlaces); });
one.Start();
two.Start();
three.Start();
four.Start();
In ProcessMatches, there is a foreach on the List of ADRecords passed in. The first line in the foreach is AdRecordsProcessed++; which is a global static int, and the program finishes with it at 130 instead of the 544k.
The variable is never set to null and is still in scope? If so, it shouldn't be collected and idle time isn't your problem.
First issue I see is:
AdRecordsProcessed++;
Are you locking that global variable before updating it? If not, and depending on how fast the records are processed, it's going to be lower than you expect.
Try running it from a single thread (i.e. pass in adRecords instead of firstQuarter and don't start the other threads.) Does it work as expected with 1 thread?
Firstly, you don't set a list to null. What you might do is set every reference to a list to null (or to another list), or all such references might simply fall out of scope. This may seem like a nitpick point, but if you are having to examine what is happening to your data it's time to be nitpicky on such things.
Secondly, getting the GC to deallocate something that has a live reference is pretty hard to do. You can fake it with a WeakReference<> or think you've found it when you hit a bug in a finaliser (because the reference isn't actually live, and even then its a matter of the finaliser trying to deal with a finalised rather than deallocated object). Bugs can happen everywhere, but that you've found a way to make the GC deallocate something that is live is highly unlikely.
The GC will be likely do two things with your list:
It is quite likely to compact the memory used by it, which will move its component items around.
It is quite likely to promote it to a higher generation.
Neither of these are going to have any changes you will detect unless you actually look for them (obviously you'll notice a change in generation if you keep calling GetGeneration(), but aside from that you aren't really going to).
The memory used could also be paged out, but it will be paged back in when you go to use the objects. Again, no effect you will notice.
Finally, if the GC did deallocate something, you wouldn't have a reduced number of items, you'd have a crash, because if objects just got deallocated the system will still try to use the supposedly live references to them.
So, while the GC or the OS may do something to make room for your other object, it isn't something observable in code, and it does not stop the object from being available and in the same programmatic state.
Something else is the problem.
Is there a reason you have to get all the data all at once? If you break the data up into chunks it should be more manageable. All I know is having to get into GC stuff is a little smelly. Best to look at refactoring your code.
The garbage collector will not collect:
A global variable
Objects managed by static objects
A local variable
A variable referencable by any method on the call stack
So if you can reference it from your code, there is no possibility that the garbage collector collected it. No way, no how.
In order for the collector to collect it, all references to it must have gone away. And if you can see it, that's most definitely not the case.