I use a SQL Server CE database for some simulation results. When I test the reading speed the benchmark-times differ greatly.
dbs are split to 4 .SDF files (4 quartals)
302*525000 entries overall
four SqlCeConnections
they are opened before reading and stay open
all four databases are located on the same disk (SSD)
I use SqlCeDataReader (IMHO as low-level and fast as you can get)
Reading process is parallel
Simplified code
for (int run = 0; run < 4; run++)
{
InitializeConnections();
for (int reading = 0; reading < 6; reading++)
{
ResetMemoryObjects();
Parallel.For(0, quartals.Count, (i) =>
{
values[i] = ReadFromSqlCeDb(i);
}
}
}
connections are initialized each run 1 time
all readings take place in a simple for loop and are exactly the same
before each reading all the objects are reinitialized
These are the benchmark results I get:
At this point I'm honest - I have no idea why SQL Server CE behaves that way. Maybe someone can give me a hint?
Edit 1:
I made a more in depth analysis of each step during the parallel reading. The following chart shows the steps, the "actual reading" is the part inside the while(readerdt.Read()) section.
Edit 2:
After ErikEJ' suggestion I added a TableDirect Approach and made 150 runs, 75 for SELECT and 75 for TableDirect. I summed up the Pre- & Postprocessing of the reading process, because this remains stable and nearly the same for all runs. What differs vastly is the actual reading process.
Every second run was done via TableDirect, so they both start to get drastic better results at around run 65 simultaneously. The range goes from 5.7 second up to 37.4 seconds.
This is the "acutal reading" code. (there are four different databases/sdf files with four different SqlCe connections. Tested on Ryzen7 8-Core CPU)
private static List<List<double>> ReadDataFromDbTableDirect((double von, double bis) timepoints, List<(string compName, string resName)> components, int dbIdx)
{
var values = new List<List<double>>();
for (int j = 0; j < components.Count; j++)
{
values.Add(new List<double>());
}
using (var command = sqlCeCon[dbIdx].CreateCommand())
{
command.CommandType = CommandType.TableDirect;
command.CommandText = table.TableName;
command.IndexName = "PKTIME";
command.SetRange(DbRangeOptions.InclusiveStart | DbRangeOptions.InclusiveEnd, new object[]{timepoints.von},new object[]{timepoints.bis});
using (var reader = (SqlCeDataReader)command.ExecuteReader(CommandBehavior.Default))
{
while (reader.Read())
{
for (int j = 0; j < components.Count; j++)
{
if (!reader.IsDBNull(j))
{
if (j == 0)
{
values[j].Add(reader.GetInt32(j));
}
else
{
values[j].Add(reader.GetDouble(j));
}
}
}
}
}
}
return values;
}
Still no idea why it has such a great delay.
SDF looks like this
Edit 3:
Today I made the same approach with a single database instead of four databases (to exclude problems with Parallel/Tasks). While here TableDirect has a little advantage, the main problem of differing reading speed persists (the sdf data is the same so it is comparable).
Edit 4:
These are the results on another machine. Still large outbursts, but a bit more stable. Overall still same issue.
Edit 5:
These are the results of a 4x smaller db - 500 runs. TableDirect & Select are (as in previous benchmarks) run alternately, but to better see the results in the graph they are shown in sequence. Notice that the overall time here is not 4-times smaller as you'd expect, but ~8-times smaller. Same problem with the high reading time in the beginning. Next I'll optimize ArrayPooling and stuff...
Edit 6:
To further investigate I tried this on 2 different machines. Pc#2 has Win11Home, Ryzen 5, SSD (quite new) and no Antivir, Pc#3 has Win10Pro - pristine installation, everything deactivated (WinDefender), SSD, Ryzen 7. On Pc#3 there is just one peak (first run), on Pc#2 there are several besides the initial requests.
Edit 7 :
After ErikEJ's suggestion it might be due to an Index Rebuild, I tried several things.
A test the reading times after a fresh simulation (db is freshly built in the simulation)
B test the reading times after copying and loading a db from another folder and apply a Verify(SqlCeEngine) on the db (Verify)
C test the reading times after copying and loading a db from another folder (no special db treatment)
D test the reading times after copying and loading a db from another folder, then make a quick first call with one row of all cols (preparation call)
I also tested SqlCeEngine Repair & Compact. They had nearly the same results as B
It seems like a verification solves the problem with the initial reading speed. Unfortunately the verification itself takes quite long (>10s on big dbs). Is there a quicker solution for this?
Result D is a complete surprise to me (mind the different scale). I don't understand what is happening here.. Any guesses?
Result C shows the long reading times on initial readings, but not always, which is irritating. Maybe it is not the index rebuild which is causing this?
I still have very huge variations in the reading speed on bigger databases.
I'm currently working on an improved reading process with pointers/memory to reduce GC pressure.
I will make the same tests today on another machine.
If anyone has an idea how to improve/stabilize reading speeds please let me know! Thanks in advance!
Related
I'm developing an Grammatical Evolution Engine that does the following:
Parse a file with the BNF rules.
<letter> ::= a|b|c|d
Generates random solutions based in some specific rules. (Basically, generates int arrays)
i1 = [22341,123412,521123, 123123], i2 = [213213, 123,5125,634643]
Maps those int arrays into the rules in the bnf file:
i1 = [22341,123412,521123, 123123] => ddbca
Checks those solutions with some target previously defined.
i1 value ('ddbca') is ('hello_world') ? 0, else 1
Selects the best performing solutions (top 5, top 10, etc) for latter usage
Randomly, picks 2 solutions from the solution list, and perform a crossover:
i1 = [22341,123412,521123, 123123], i2 = [213213, 123,5125,634643]
i1 x i2 => [22341,123412, 5125,634643]
Based in some predefined probability, executes a mutation in all individuals:
for(int i = 0; i < i3.length; i++)
{
if(random.NextDouble() <= 0.5) {
i3[i] = random.Next()
}
}
Again, execute mapping:
i3 = [22341,123412, 5125,634643] => qwerast
9. Check this new solution against target.
10. Goes back to step 5, and executes everything again.
The problem that i'm facing is: My algorithm is generating really large int arrays, but all of then are small lived. After a generation, all solutions that weren't selected, should be disposed. But, since the arrays are getting bigger, almost all of then go to the LOH, and when the GC goes to collect then, my application performance drops drastically.
In a single core environment, it starts at 15 generations/s, and after 160 generations, this drops to 3 generations per second.
I already tried to use ArrayPool, but, since i have hundreds of solutions in memory, i saw no performance improvement, and a great impact on memory usage.
I tried to used the ChunkedList Idea from this link, and the performance did not improve, but the LOH drops considerably.
I already change most of my classes to structs, tried to optimize most simple thing (Avoid Linq, use for insted of foreach, etc), but the big performance hit are in those large arrays.
Any of you can think in some kind of solution for this problem that i'm facing?
Thank you in advance!
Given the task to improve the performance of a piece of code, I have came across the following phenomenon. I have a large collection of reference types in a generic Queue and I'm removing and processing the element one by one, then add them to another generic collection.
It seems the larger the elements are the more time it takes to add the element to the collection.
Trying to narrow down the problem to the relevant part of the code, I've written a test (omitting the processing of elements, just doing the insert):
class Small
{
public Small()
{
this.s001 = "001";
this.s002 = "002";
}
string s001;
string s002;
}
class Large
{
public Large()
{
this.s001 = "001";
this.s002 = "002";
...
this.s050 = "050";
}
string s001;
string s002;
...
string s050;
}
static void Main(string[] args)
{
const int N = 1000000;
var storage = new List<object>(N);
for (int i = 0; i < N; ++i)
{
//storage.Add(new Small());
storage.Add(new Large());
}
List<object> outCollection = new List<object>();
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = N-1; i > 0; --i)
{
outCollection.Add(storage[i];);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
On the test machine, using the Small class, it takes about 25-30 ms to run, while it takes 40-45 ms with Large.
I know that the outCollection has to grow from time to time to be able to store all the items, so there is some dynamic memory allocation. But given an initial collection size even makes the difference more obvious: 11-12 ms with Small and 35-38 ms with Large objects.
I am somewhat surprised, as these are reference types, so I was expecting the collections to work only with references to the Small/Large instances. I have read Eric Lippert's relevant article that and know that references should not be treated as pointers. At the same time, AFAIK currently they are implemented as pointers and their size and the collection's performance should be independent of element size.
I've decided to put up a question here hoping that someone could explain or help me to understand what's happening here. Aside the performance improvement, I'm really curious what is happening behind the scenes.
Update:
Profiling data using the diagnostic tools didn't help me much, although I have to admit I'm not an expert using the profiler. I'll collect more data later today to find where the bottleneck is.
The pressure on the GC is quite high of course, especially with the Large instances. But once the instances are created and stored in the storage collection, and the program enters the loop, there was no collection triggered any more, and memory usage hasn't increased significantly (outCollction already pre-allocated).
Most of the CPU time is of course spent with memory allocation (JIT_New), around 62% and the only other significant entry is Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % Module Name
System.Collections.Generic.List`1[System.__Canon].Add with about 7%.
With 1 million items the preallocated outCollection size is 8 million bytes (the same as the size of storage); one can suspect 64 bit addresses being stored in the collections.
Probably I'm not using the tools properly or don't have the experience to interpret the results correctly, but the profiler didn't help me to get closer to the cause.
If the loop is not triggering collections and it only copies pointers between 2 pre-allocated collections, how could the item size cause any difference? Cache hit/miss ratio is supposed to be the more or less the same in both cases, as the loop is iteration over a list of "addresses" in both cases.
Thanks for all the help so far, I will collect more data, and put an update here if anything found.
I suspect that at least one action in the above (maybe some type checks) will require a de-reference. Then the fact that many Smalls are probably sat close together on the heap and thus sharing cache lines could account for some amount of difference (certainly many more of them could share a single cache line than Larges).
Added to which you are also accessing them in the reverse order in which they were allocated which maximises such a benefit.
I am using C# client "StackExchange.Redis" for benchmarking Redis.
The dataset is a text file of close to 16 million records. Each record has six entries, three of which are double and the other three are integers.
When I use LPush (LPushRight in api), it takes close to 4 minutes for all the data be added to Redis.
Afterwards, when I retrieve the data using (LRange in api), it takes almost 1.5 minutes to retrieve all the list.
I am using following code:
Connection:
ConnectionMultiplexer redis = ConnectionMultiplexer.Connect("localhost");
IDatabase db = redis.GetDatabase();
Insertion:
IEnumerable<string> lines =
File.ReadLines(#"C:\Hep.xyz");
List<string> linesList = lines.ToList();
int count = lines.Count();
string[] toks;
RedisValue[] redisToks = { "", "", "", "", "", "" };
for (int i = 0; i < count; i++)
{
toks = linesList[i].Split(' ', '\t');
for (int j = 0; j < 6; j++)
{
redisToks[j] = toks[j];
}
db.ListRightPushAsync("PS:DATA:", redisToks);
if (i % 1000000 == 0)
{
Console.WriteLine("Lines Read: {0}", i);
}
}
Console.WriteLine("Press any key to continue ...");
Console.ReadLine();
Retrieval:
long len = db.ListLength("PS:DATA:");
long start = 0;
long end = 99999;
while (end < len)
{
RedisValue[] val = db.ListRange("PS:DATA:", start, end);
int length = val.Length;
start += 100000;
end += 100000;
}
Console.WriteLine("Press any key to continue ...");
Console.ReadLine();
For COnfiguration:
I have set maxmemory to 4GB and maxmemory-policy to volatile-lru
I am running all that locally on my system. My system specs are
8 GB RAM
Inter Core i7 - 5500U CPU # 2.4GHz (4 CPUs), ~2.4 GHz
Could you please help me identify the factors I need to look into, to improve performance. Also, is redis suitable for this kind of dataset?
This is not caused by that redis is slow. Becuase when you save data into redis, the time cost also includes the file io (disk io) and network io, especially the time you read lines from disk files, which takes a big time cost. So you see when you retrieve data from redis, the time cost is just 1.5 minutes and when you do insertion it took about 4 minutes.
In conclusion, redis works well in your case. one more thing to speed up the insertion is that you can use redis pipeline to decrease the network transport time.
The async write is not like the pipeline. But you should notice that async write just does not block the client. however the pipeline is batch sending command and batch read reply. So it is different.
And see https://redis.io/topics/pipelining It's not just a matter of RTT section. pipelining save redis server read() and write() time also.
Yes, what you faced is the issue mentioned in Redis's official website's documentation:
Redis lists are implemented via Linked Lists. This means that even if you have millions of elements inside a list, the operation of adding a new element in the head or in the tail of the list is performed in constant time. The speed of adding a new element with the LPUSH command to the head of a list with ten elements is the same as adding an element to the head of list with 10 million elements.
This is the reason why your insertion operation is very fast. The document further continues:
What's the downside? Accessing an element by index is very fast in
lists implemented with an Array (constant time indexed access) and not
so fast in lists implemented by linked lists (where the operation
requires an amount of work proportional to the index of the accessed
element).
The document further suggest to use Sorted Sets if you want fast access:
When fast access to the middle of a large collection of elements is important, there is a different data structure that can be used, called sorted sets. Sorted sets will be covered later in this tutorial.
I'm trying to read all of the feature data from particular shapefile. In this case, I'm using DotSpatial to open the file, and I'm iterating through the features. This particular shapefile is only 9mb in size, and the dbf file is 14mb. There is roughly 75k features to loop through.
Note, this is all programmatically through a console app, so there is no rendering or anything involved.
When loading the shape file, I reproject, then I'm iterating. The loading an reprojecting is super quick. However, as soon as the code reaches my foreach block, it takes nearly 2 full minutes to load the data, and uses roughly 2GB of memory when debugging in VisualStudio. This seems very, very excessive for what's a reasonably small data file.
I've ran the same code outside of Visual Studio, from the command line, however the time is still roughly 2 full minutes, and about 1.3GB of memory for the process.
Is there anyway to speed this up at all?
Below is my code:
// Load the shape file and project to GDA94
Shapefile indexMapFile = Shapefile.OpenFile(shapeFilePath);
indexMapFile.Reproject(KnownCoordinateSystems.Geographic.Australia.GeocentricDatumofAustralia1994);
// Get's slow here and takes forever to get to the first item
foreach(IFeature feature in indexMapFile.Features)
{
// Once inside the loop, it's blazingly quick.
}
Interestingly, when I use the VS immediate window, it's super super fast, no delay at all...
I've managed to figure this out...
For some reason, calling foreach on the features is painfully slow.
However, as these files have a 1-1 mapping with features - data rows (each feature has a relevant data row), I've modified it slightly to the following. It's now very quick.. less than a second to start the iterations.
// Load the shape file and project to GDA94
Shapefile indexMapFile = Shapefile.OpenFile(shapeFilePath);
indexMapFile.Reproject(KnownCoordinateSystems.Geographic.Australia.GeocentricDatumofAustralia1994);
// Get the map index from the Feature data
for(int i = 0; i < indexMapFile.DataTable.Rows.Count; i++)
{
// Get the feature
IFeature feature = indexMapFile.Features.ElementAt(i);
// Now it's very quick to iterate through and work with the feature.
}
I wonder why this would be. I think I need to look at the iterator on the IFeatureList implementation.
Cheers,
Justin
This has the same problem for very large files (1.2 millions of features), populating .Features collections never ends.
But if you ask for the feature you do not have memory or delay overheads.
int lRows = fs.NumRows();
for (int i = 0; i < lRows; i++)
{
// Get the feature
IFeature pFeat = fs.GetFeature(i);
StringBuilder sb = new StringBuilder();
{
sb.Append(Guid.NewGuid().ToString());
sb.Append("|");
sb.Append(pFeat.DataRow["MAPA"]);
sb.Append("|");
sb.Append(pFeat.BasicGeometry.ToString());
}
pLinesList.Add(sb.ToString());
lCnt++;
if (lCnt % 10 == 0)
{
pOld = Console.ForegroundColor;
Console.ForegroundColor = ConsoleColor.DarkGreen;
Console.Write("\r{0} de {1} ({2}%)", lCnt.ToString(), lRows.ToString(), (100.0 * ((float)lCnt / (float)lRows)).ToString());
Console.ForegroundColor = pOld;
}
}
Look for the GetFeature method.
I am trying to write a program to perform external merge sort on a massive dataset. As a first step, I need to split the dataset into chunks that would fit into RAM. I have the following questions:
Suppose my machine has x amount of RAM installed, is there a
theoretical maximum limit on how much of it could be made available
to my process?
When I run the below program, I get a non-zero value as available memory when it fails. Why does the memory allocation fail when there is still unused RAM left? there is still 2.8GB free RAM when the memory allocation fails. What explains the observed behavior?
List<string> list = new List<string>();
try
{
while (true)
{
list.Add("random string");
}
}
catch(Exception e)
{
Microsoft.VisualBasic.Devices.ComputerInfo CI = new ComputerInfo();
Console.WriteLine(CI.AvailablePhysicalMemory);
}
If there are other programs running concurrently, how do I
determine, how much RAM is available for use by the current process?
Here is what you're looking for: ComputerInfo.AvailablePhysicalMemory
Gets the total amount of free physical memory for the computer.
private ulong GetMaxAvailableRAM()
{
Microsoft.VisualBasic.Devices.ComputerInfo CI = new ComputerInfo();
return CI.AvailablePhysicalMemory;
}
NOTE: You will need a to add a reference to Microsoft.VisualBasic
UPDATE:
Your sample to fill the RAM will run into a few other limits first.
It will first hit OutOfMemory if your not building in 64-bit. You should change your solution to build for x64 = 64-bit within the Solution configuration:
Secondly your List has a maximum supported array dimension.
By adding many small objects you will hit that limit first.
Here is a quick and dirty example making a List of Lists of strings.
(This could have smaller code using Images etc... But I was trying to stay similar to your example.)
When this is run it will consume all of your RAM and eventually start paging to disk. Remember Windows has Virtual RAM which will eventually get used up, but it's much slower than regular RAM. Also, if it uses all that up, then it might not even be able to allocate the space to instantiate the ComputerInfo Class.
NOTE: Be careful, this code will consume all RAM and potentially make your system unstable.
List<List<string>> list = new List<List<string>>();
try
{
for (UInt32 I = 0; I < 134217727; I++)
{
List<string> SubList = new List<string>();
list.Add(SubList);
for (UInt32 x = 0; x < 134217727; x++)
{
SubList.Add("random string");
}
}
}
catch (Exception Ex)
{
Console.WriteLine(Ex.Message);
Microsoft.VisualBasic.Devices.ComputerInfo CI = new ComputerInfo();
Console.WriteLine(CI.AvailablePhysicalMemory);
}
NOTE: To prevent using the Disk you could try to use something like System.Security.SecureString which prevents itself from being written to disk, but it would be very very slow to accumulate enough to fill your RAM.
Here is a test run showing the Physical Memory usage. I started running at (1)
I suggest for your final implementation that you use the ComputerInfo.AvailablePhysicalMemory value to determine how much of your data you can load before loading it (leaving some for the OS). And also look to lock objects in memory (usually used for Marshaling, etc..) to prevent accidental use of Virtual Memory.