Drawing Many Objects to Screen - c#

I'm working on a project in which we need to summarize a substantial amount of data in the form of a heat map. This data will be kept in a database for as long as possible. At some point, we will need to store a summary in a matrix (possibly?) before we can draw the blocks for the heat map to the screen. We are creating a windows form application with C#.
Let's assume the heat map is going to summarize a log file for an online mapping program such as google maps. It will assign a color to a particular address or region based on the number of times a request was made to that region/address. It can summarize the data at differing levels of detail. That is, each block on the heat map can summarize data for a particular address (max detail, therefore billions/millions of blocks) or it can summarize for requests to a street, city, or country (minimum detail -- few blocks as they each represent a country). Imagine that millions of requests were made for addresses. We have considered summarizing this with a database. The problem is that we need to draw so many blocks to the screen (up to billions, but usually much less). Let's assume this data is summarized in a database table that stores the number of hits to the larger regions. Can we draw the blocks to the window without constructing an object for each region or even bringing in all of the information from the db table? That's my primary concern, because if we did construct a matrix, it could be around 10 GB for a demanding request.
I'm curious to know how many blocks we can draw to the screen and what the best approach to this may be (i.e. direct3d, XNA). From above, you can see the range will vary substantially and we expect the potential for billions of squares that need to be drawn. We will have a vertical scroll bar to scroll down quickly to see other blocks.
Overall, I'm wondering how we might accomplish this with C#? Creating the matrix for the demanding request could require around 10 Gigabytes. Is there a way to draw to the screen that will not require a substantial amount of memory (i.e. creating an object for each block). If we could have the results of a SQL query be translated directly into rendered blocks on the screen, that would be ideal (i.e. not constructing objects, etc etc). All we need are squares and their only property is color and we might need to maintain a number for each block.
Note:
We are pretty sure about how we will draw the heat map (how zooming, scrolling, etc should appear to user). To clarify, I'm more concerned about how we will implement our idea. Is there a library or some method that allows us to draw this many objects without constructing a billion objects and using Gigabytes of data. Each block is essentially a group of pixels (20x20) that are the same color. I don't believe this should necessitate constructing 1 billion objects.
Thanks!

If this is really for a graphic heat map, then I agree with the comments that an image that's at least 780 laptop screens wide is impractical. If you have this information in a SQL(?) database somewhere, then you can do a fancy query that partitions your results into buckets of a certain widths. The database should be able to aggregate these records into 1680 (pixels wide) buckets efficiently.
Furthermore, if your buckets are of a fixed width (yielding a fixed width heat-map image) you could pre-generate the bucket numbers for the "addresses" in your database. Indexed properly, grouping by this would be very fast.
If you DO need to see a 1:1 image, you might consider only rendering a section of the image that you're scrolled to. This would significantly reduce the amount of memory necessary to store the current view. Assuming you don't need to actually view all 780 screens worth of data at 100% (especially if you couple this with the "big picture view" strategy above) then you'll save on processing too.
The aggregate function for the "big picture view" might be MAX, SUM, AVG. If these functions aren't appropriate, please explain more about the particular features you'd be looking for in the heat-map.
As far as the drawing itself, you don't need "objects" for each box, you just need to draw the pixels on a graphics object.

I think technique you are looking for is called "virtualization". Now I don't mean hardware virtualization, but technique, where you create concrete visual object only for items, that are visible. Many grids and lists use this technique to show thousands of hundreds of items at normal speeds and memory consumption. You can also reuse those visual objects while swaping concrete data objects.
I would also question necesity of displaying bilions of details. You should make it similiar to zooming or agregation of data to show only few items and then let the user choose specific part or pice of data. But I guess you have that thought out.

Related

Managing textures in c# via instantiating them all in one or several arrays

I am developing a game in c#, I'm rather new at C#. I would like to know if this approach would affect adversely the performance?
Instantiate all textures in four categories, id est 4 different arrays.
This is to keep relevant textures apart from each other (example MonsterA needs 3 textures that are in the same array)
Have objects with generic Lists to point at the texture(s) they need.
Since the textures are in the same array this would help with the caching etc, I think
As far as I know List would only create pointers that have locality not so much the actual textures. I am Using SFML.Net, but this should apply to say, listing pictures of some sort, or listing objects you want to have locality.
The question is then, will doing this affect it adversely, will it work as I expect or will it not matter at all? And why?
If you are very serious about that - try all approaches and measure/compare. Don't forget to set your goals first, otherwise you'll be trying to save time/memory when it does not cause problem for your case. Note that you need to measure complete sequence you worry about, not just "load textures" part.
It is very unlikely if performance will be impacted by the way you are arranging metadata portion of textures (everything but image bytes) - the amount of memory used by images themselves would be much bigger than any list/dictionaries you refer textures from.
Main optimizations with textures are
not load them at all till needed/potentially need
somehow make them smaller (multiple detail levels, compression,...)
sometimes number of textures matter and image strips / sprite sheets can be used to combine multiple images into one.
But for most projects doing nothing special is a good start - if you finished game/program that is somewhat slower that you'd like is much better than 1/3 complete one but with very fast texture loading (or whatever you decide to optimize too much).

High performance graphics using the WPF Visual layer

I am creating a WPF mapping program which will potentially load and draw hundreds of files to the screen at any one time, and a user may want to zoom and pan this display. Some of these file types may contain thousands of points, which would most likely be connected as some kind of path. Other supported formats will include TIFF files.
Is it better for performance to have a single DrawingVisual to which all data is drawn, or should I be creating a new DrawingVisual for each file loaded?
If anyone can offer any advice on this it would be much appreciated.
You will find lots of related questions on Stack Overflow, however not all of them mention that one of the most high-performance ways to draw large amounts of data to the screen is to use the WriteableBitmap API. I suggest taking a look at the WriteableBitmapEx open source project on codeplex. Disclosure, I have contributed to this once, but it is not my library.
Having experimented with DrawingVisual, StreamGeometry, OnRender, Canvas, all these fall over once you have to draw 1,000+ or more "objects" to the screen. There are techniques that deal with the virtualization of a canvas (there' a million items demo with Virtualized Canvas) but even this is limited to the ~1000 visible at one time before slow down. WriteableBitmap allows you to access a bitmap directly and draw on that (oldskool style) meaning you can draw tens of thousands of objects at speed. You are free to implement your own optimisations (multi-threading, level of detail) but do note you don't get much frills with that API. You literally are doing the work yourself.
There is one caveat though. While WPF uses the CPU for tesselation / GPU for rendering, WriteableBitmap will use CPU for everything. Therefore the fill-rate (number of pixels rendered per frame) becomes the bottleneck depending on your CPU power.
Failing that if you really need high-performance rendering, I'd suggest taking a look at SharpDX (Managed DirectX) and the interop with WPF. This will give you the highest performance as it will directly use the GPU.
Using many small DrawingVisuals with few details rendered per visual gave better performance in my experience compared to less DrawingVisuals with more details rendered per visual. I also found that deleting all of the visuals and rendering new visuals was faster than reusing existing visuals when a redraw was required. Breaking each map into a number of visuals may help performance.
As with anything performance related, conducting timing tests with your own scenarios is the best way to be sure.

Charting massive amounts of data

We are currently using ZedGraph to draw a line chart of some data. The input data comes from a file of arbitrary size, therefore, we do not know what the maximum number of datapoints in advance. However, by opening the file and reading the header, we can find out how many data points are in the file.
The file format is essentially [time (double), value (double)]. However, the entries are not uniform in the time axis. There may not be any points between say t = 0 sec and t = 10 sec, but there might be 100K entires between t = 10 sec and t = 11 sec, and so on.
As an example, our test dataset file is ~2.6 GB and it has 324M points. We'd like to show the entire graph to the user and let her navigate through the chart. However, loading up 324M points to ZedGraph not only is impossible (we're on a 32-bit machine), but also not useful since there is no point of having so many points on the screen.
Using the FilteredPointList feature of ZedGraph also appears to be out of question, since that requires loading the entire data first and then performing filtering on that data.
So, unless we're missing anything, it appears that our only solution is to -somehow- decimate the data, however as we keep working on it, we're running into a lot of problems:
1- How do we decimate data that is not arriving uniformly in time?
2- Since the entire data can't be loaded into memory, any algorithm needs to work on the disk and so needs to be designed carefully.
3- How do we handle zooming in and out, especially, when the data is not uniform on the x-axis.
If data was uniform, upon initial load of the graph, we could Seek() by predefined amount of entries in the file, and choose every N other samples and feed it to ZedGraph. However, since the data is not uniform, we have to be more intelligent in choosing the samples to display, and we can't come up with any intelligent algorithm that would not have to read the entire file.
I apologize since the question does not have razor-sharp specificity, but I hope I could explain the nature and scope of our problem.
We're on Windows 32-bit, .NET 4.0.
I've needed this before, and it's not easy to do. I ended up writing my own graph component because of this requirement. It turned out better in the end because I put in all the features we needed.
Basically, you need to get the range of data (min and max possible/needed index values), subdivide it into segments (let's say 100 segments), and then determine a value for each segment by some algorithm (average value, median value, etc.). Then you plot based on those summarized 100 elements. This is much faster than trying to plot millions of points :-).
So what I am saying is similar to what you are saying. You mention you do not want to plot every X element because there might be a long stretch of time (index values on the x-axis) between elements. What I am saying is that for each subdivision of data determine what is the best value, and take that as the data point. My method is index value-based, so in your example of no data between the 0 sec and 10-sec index values I would still put data points there, they would just have the same values among themselves.
The point is to summarize the data before you plot it. Think through your algorithms to do that carefully, there are lots of ways to do so, choose the one that works for your application.
You might get away with not writing your own graph component and just write the data summarization algorithm.
I would approach this in two steps:
Pre-processing the data
Displaying the data
Step 1
The file should be preprocessed into a binary fixed format file.
Adding an index to the format, it would be int,double,double.
See this article for speed comparisons:
http://www.codeproject.com/KB/files/fastbinaryfileinput.aspx
You can then either break up the file into time intervals, say
one per hour or day, which will give you an easy way to express
accessing different time intervals. You could also just keep
one big file and have an index file which tells you where to find specific times,
1,1/27/2011 8:30:00
13456,1/27/2011 9:30:00
By using one of these methods you will be able to quickly find any block of data
by either time, via a index or file name, or by number of entries, due to the fixed byte
format.
Step 2
Ways to show data
1. Just display each record by index.
2. Normalize data and create aggregate data bars with open, high, low ,close values.
a. By Time
b. By record count
c. By Diff between value
For more possible ways to aggregate non-uniform data sets, you may want to look at
different methods used to aggregate trade data in the financial markets. Of course,
for speed in realtime rendering you would want to create files with this data already
aggregated.
1- How do we decimate data that is not
arriving uniformly in time?
(Note - I'm assuming your loader datafile is in text format.)
On a similar project, I had to read datafiles that were more than 5GB in size. The only way I could parse it out was by reading it into an RDBMS table. We chose MySQL because it makes importing text files into datatables drop-dead simple. (An interesting aside -- I was on a 32-bit Windows machine and couldn't open the text file for viewing, but MySQL read it no problem.) The other perk was MySQL is screaming, screaming fast.
Once the data was in the database, we could easily sort it and quantify large amounts of data into singular paraphrased queries (using built-in SQL summary functions like SUM). MySQL could even read its query results back out to a text file for use as loader data.
Long story short, consuming that much data mandates the use of a tool that can summarize the data. MySQL fits the bill (pun intended...it's free).
A relatively easy alternative I've found to do this is to do the following:
Iterate through the data in small point groupings (say 3 to 5 points at a time - the larger the group, the faster the algorithm will work but the less accurate the aggregation will be).
Compute the min & max of the small group.
Remove all points that are not the min or max from that group (i.e. you only keep 2 points from each group and omit the rest).
Keep looping through the data (repeating this process) from start to end removing points until the aggregated data set has a sufficiently small number of points where it can be charted without choking the PC.
I've used this algorithm in the past to take datasets of ~10 million points down to the order of ~5K points without any obvious visible distortion to the graph.
The idea here is that, while throwing out points, you're preserving the peaks and valleys so the "signal" viewed in the final graph isn't "averaged down" (normally, if averaging, you'll see the peaks and the valleys become less prominent).
The other advantage is that you're always seeing "real" datapoints on the final graph (it's missing a bunch of points, but the points that are there were actually in the original dataset so, if you mouse over something, you can show the actual x & y values because they're real, not averaged).
Lastly, this also helps with the problem of not having consistent x-axis spacing (again, you'll have real points instead of averaging X-Axis positions).
I'm not sure how well this approach would work w/ 100s of millions of datapoints like you have, but it might be worth a try.

for heavy graphics apps in c# which will be more efficient double buffering or Buffered Graphics?

hello i have a heavy graphics application where i have to draw the graphics in 2-10 seconds every time this time varies depending upon the source application which is sending data to my application via UDP;
i have some static graphics there is no change in them some are semi dynamic that means some time they are updated and normally remains unchanged and all other graphics are dynamic there are about 8000 approx objects that are dynamic
i am working in c# and learn the two techniques given in title which one will be more efficient in this case help required
thanx in advance;
How large are your objects?
One probably can't predict what's more efficient here, it depends on everything, type of objects, size of objects, complexity of converting data to visible graphics and most of all the speed of your internet connection will limit your application.
In the end you probably want to try both and measure their performance. Even then you might want to implement it as a setting so the user can flip between the two.

How to design a high performance grid with VS 2005(specifically C#)

I need to build a high performance winforms data grid using Visual Studio 2005, and I'm at a loss with where to start. I've build plenty of data grid applications, but none of those were very good when the data was constantly refreshing.
The grid is going to be roughly 100 rows by 40 columns, and each cell in the grid is going to update between 1 and 2 times a second(some cells possibly more). To me, this is the biggest drawback of the out of the box data grid, the repainting isn't very efficient.
Couple caveats
1) No third party vendors. This grid is backbone of all our applications, so while XCeed or Syncfusion or whatever might get us up and running faster, we'd slam into its limitations and be hosed. I'd rather put in the extra work up front, and have a grid that does exactly what we need.
2) I have access to Visual Studio 2008, so if it would be much better to start this in 2008, then I can do that. If its a tossup, I'd like to stick with 2005.
So whats the best approach here?
I would recommend the following approach if you have many cells that are updating at different rates. Rather than try to invalidate each cell each time the value changes you would be better off by limiting the refresh rate.
Have a timer that fires at a predefined rate, such as 4 times per second, and then each time it fires you repaint the cells that have changed since the last time around. You can then tweak the update rate in order to find the best compromise between performance and usability with some simple testing.
This has the advantage of not trying to update too often and so killing your CPU performance. It batches up changes between each refresh cycle and so two quick changes to a value that occur fractions of a second apart do not cause two refreshes when only the latest value is actually worth drawing.
Note this delayed drawing only applies to the rapid updates in value and does not apply to general drawing such as when the user moves the scroll bar. In that case you should draw as fast as the scroll events occur to give a nice smooth experience.
We use the Syncfusion grid control and from what I've seen it's pretty flexible if you take the time to modify it. I don't work with the control myself, one of my co-workers does all of the grid work but we've extended it to our needs pretty well including custom painting.
I know this isn't exactly answering your question, but it writing a control like this from scratch is going always going to be much more complicated than you anticipate, regardless of your anticipations. Since it'll be constantly updating I assume it's going to be databound which will be a chore in itself, especially to get it to be highly performant. Then there's debugging it.
Try the grid from DevExpress or ComponentOne. I know from experience that the built-in grids are never going to be fast enough for anything but the most trivial of applications.
I am planning to build a grid control to do the same as pass time, but still haven't got time. Most of the commercial grid controls have big memory foot print and update is typically an issue.
My tips would be (if you go custom control)
1. Extend a Control (not UserControl or something similar). It will give you speed, without losing much.
2. In my case I was targeting the grid to contain more data. Say a million row with some 20-100 odd columns. In such scenarios it usually makes more sense to draw it yourself. Do not try to represent each cell by some Control (like say Label, TextBox, etc). They eat up a lot of resources (window handles, memory, etc).
3. Go MVC.
The idea is simple: At any given time, you can display limited amount of data, due to screen size limitations, Human eye limitation, etc
So your viewport is very small even if you have gazillion rows and columns and the number of updates you have to do are no more than 5 per second to be any useful to read even if the data behind the grid id being updated gazillion times per second. Also remember even if the text/image to be displayed per cell is huge, the user is still limited by the cell size.
Caching styles (generic word to represent textsizes, fonts, Colors etc), also help in such scenario depending on how many of them you will be using in your grid.
There will be lot more work in getting some basic drawing (highlights, grid, boundaries, borders, etc) done to get various effects.
I don't recall exactly, but there was a c# .net grid on sourceforge, which can give you a good idea of how to start. That grid offered 2 options, VirtualGrid where the model data is not held by the grid making it very lightweight, and a Real grid (traditional) where the data storage is owned by the grid itself (mostly creating a duplicate, but depends on the application)
For a super-agile (in terms of updates), it might just be better to have a "VirtualGrid"
Just my thoughts

Categories