How to deal with large objects?

How to deal with large objects? - c#

I have 5 types of objects: place info (14 properties),owner company info (5 properties), picture, ratings (stores multiple vote results), comments.
All those 5 objects will gather to make one object (Place) which will have all the properties and information about all the Place's info, pictures, comments, etc
What I'm trying to achieve is to have a page that displays the place object and all it's properties. another issue, if I want to display the Owner Companies' profiles I'll have object for each owner company (but I'll add a sixth property which is a list of all the places they own)
I've been practicing for a while, but I never got into implementing and performance experience, but I sensed that it was too much!
What do you think ?

You have to examine the use case scenarios for your solution. Do you need to always show all of the data, or are you starting off with displaying only a portion of it? Are users likely to expand any collapsed items as part of regular usage or is this information only used in less common usages?
Depending on your answers it may be best to fetch and populate the entire page with all of the data at once, or it may be the case that only some data is needed to render the initial screen and the rest can be fetched on-demand.
In most cases the best solution is likely to involve fetching only the required data and to update the page dynamically using ajax queries as needed.
As for optimizing data access, you need to strike a balance between the number of database requests and the complexity of each individual request. Because of network latency it is often important to fetch as much as possible using as few queries as possible, even if this means you'll sometimes be fetching data that you do not always need. But if you include too much data in a single query, then computing all the joins may also be costly. It is quite rare to see a solution in which it is better to first fetch all root objects and then for every element go fetch some additional objects associated with that element. As such, design your solution to fetch all data at once, but include only what you really need and try to keep the number of involved tables to a minimum.

You have 3 issues to deal with really, and they are often split into DAL, BLL and UI
Your objects obviously belong in the BLL and if you're considering performance then you need to consider how your objects will be created and how they interface to the DAL. I have many objects with 50-200 properties so 14 properties is really no issue.
The UI side of it is seperate, and if you're considering the performance of displaying a lot of information onto a single page you'll consider tabbed content, grids etc.
Tackle it one thing at a time and see where your problems lie.

Related

Loading /Lazy loading of related entities

Scenario:
I have a (major) design problem. I have DTO classes to fill data from DB and use in the UI. The scenario I have is that:
I have a HouseObject which has TenantObject (one to many) with each tenant has AccountObject (one to many again) and so on (Example Scenario Only)
Problem:
Now my issue is, while retrieving data from DB for HouseObject, Should I get list of all TenantObjects and inturn list of all AccountObjects and so on? because of the one to many relationship, for one HouseObject potentially we are retrieving huge data for Tenants, Accounts and so on.
Should we just retrieve just HouseObject and fire off individual dependent queries per dependency? or should I get all data at once in single call and bind it on screen. Which is the desired solution?
Please advice.

If you looking for performance, and this is what I think you're looking for, you have to think in broad terms, not just Lazy/not lazy. You have to think, how much data you have, and how often it gets updated; where is it stored? Where application runs, how it is utilized, etc.
I see few scenarios:
Lazy load of small chunks. I like this one because it is useful when your data is modified often. You have fresh data all the time.
Caching in application layer. Here can be 2 sub-scenarios
a. Caching ready-models. You prepare your models, fully initialized.
b. Caching separate segments of data (houses, tenants, accounts) and query it in memory.
Prepare your denormalized joined data and store it in Materialized (Oracle) or Indexed (Sql Server) view. You still will have a trip to DB but it will be more efficient than join data each time, or make multiple calls each time.
Combination of #1 and #2.b and I like it the most. It requires more coding but gets you best results in performance and concurrency. Even if your data is mutating, you can have a mechanism to dump the cache. And if your Account changes, you don't need to dump Tenant.
And one more thing, if you need to update your data, remember - use different models. Your display models should be only ViewModel while you should have separate model for your Save. For example, your Save model may have field updatedBy while your View model doesn't.
You really don't have a "major" problem. It is normal daily problem for developers. You need to think of all aspects of your system.

Sorting in Array vs Sorting in SQL

I have around 1000 rows of data.On the ASPX page, whenever the user clicks the sort button, it will basically sort the result according to a specific column.
I propose to sort the result in the SQL query which is much more easier with just an Order by clause.
However, my manager insisted me to store the result in an array, then sort the data within an array because he thinks that it will affect the performance to call the database everytime the user clicks the sort button.
Just out of curiosity - Does it really matter?
Also, if we disregard the number of rows, performance wise, which of these methods is actually more efficient?

Well, there are three options:
Sort in the SQL
Sort server-side, in your ASP code
Sort client-side, in your Javascript
There's little reason to go with (2), I'd say. It's meat and drink to a database to sort as it returns data: that's what a database is designed to do.
But there's a strong case for (3) if you want to have a button that the user can click. This means it's all done client-side, so you have no need to send anything to the web server. If you have only a few rows (and 1000 is really very few these days), it'll feel much faster, because you won't have to wait for sending the request and getting a response.
Realistically, if you've got so many things that Javascript is too slow as a sorting mechanism, you've got too many things to display them all anyway.
In short, if this is a one-off thing for displaying the initial page, and you don't want the user to have to interact with the page and sort on different columns etc., then go with (1). But if the user is going to want to sort things after the page has loaded, then (3) is your friend.

Short Answer
Ah... screw it: there's no short answer to a question like this.
Longer Answer
The best solution depends on a lot of factors. The question is somewhat vague, but for the sake of simplicity let's assume that the 1000 rows are stored in the database and are being retrieved by the client.
Now, a few things to get out of the way:
Performance can mean a variety of things in a variety of situations.
Sorting is (relatively) expensive, no matter where you do it.
Sorting is least expensive when done in the database, as the database already has the all the necessary data and is optimized for these operations.
Posting a question on SO to "prove your manager wrong" is a bad idea. (The question could easily have been asked without mentioning the manager.)
Your manager believes that you should upload all the data to the client and do all the processing there. This idea has some merit. With a reasonably sized dataset processing on the client will almost always be faster than making a round trip to the server. Here's the caveat: you have to get all of that data to the client first, and that can be a very expensive operation. 1000 rows is already a big payload to send to a client. If your data set grows much larger then you would be crazy to send all of it at once, particularly if the user really only needs a few rows. In that case you'll have to do some form of paging on the server side, sending chunks of data as the user requests it, usually 10 or 20 rows at a time. Once you start paging at the server your sorting decision is made for you: you have no choice but to do your sorting there. How else would you know which rows to send?
For most "line-of-business" apps your query processing belongs in the database. My generalized recommendation: by all means do your sorting and paging in the database, then return the requested data to the client as a JSON object. Please don't regenerate the entire web page just to update the data in the grid. (I've made this mistake and it's embarrassing.) There are several JavaScript libraries dedicated solely to rendering grids from AJAX data. If this method is executed properly your page will be incredibly responsive and your database will do what it does best.

We had a problem similar to this at my last employer. we had to return large sets of data efficiently, quickly and consistently into a datagridview object.
The solution that they came up was to have a set of filters the user could use to narrow down the query return and to set the maximum number of rows returned at 500. Sorting was then done by the program on an array of those objects.
The reasons behind this were:
Most people will not not process that many rows, they are usually looking for a specific item (Hence the filters)
Sorting on the client side did save the server a bunch of time, especially when there was the potential for thousands of people to be querying the data at the same time.
Performance of the GUI object itself started to become an issue at some point (reason for limiting the returns)
I hope that helps you a bit.

From both a data-modeling perspective and from an application architecture pattern, its "best practice" to put sorting/filtering into the "controller" portion of the MVC pattern. That is directly opposed to the above answer several have already voted for.
The answer to the question is really: "It depends"
If the application stays only one table, no joins, and a low number of rows, then sorting in JavaScript on the client is likely going to win performance tests.
However, since it's already APSX, you may be preparing for your data/model to expand.--Once there are more tables and joins, and if the UI includes a data grid where the choice of which column to sort will change on a per-client basis, then maybe the middle-tier should be handling this sorting for your application.
I suggest reviewing Tom Dykstra's classic Contosa University ASP.NET example which has been updated with Entity Framework and MVC 5. It includes a section on Sorting, Filtering and Paging. This example shows the value of proper MVC architecture and the ease of implementing sorting/filtering on multiple columns.
Remember, applications change (read: "grow") over time so plan for it using an architecture pattern such as MVC.

WCF performance

We have a table in our database which has around 2,500,000 rows (around 3GB). Is it technically possible to view the data in this table in a silverlight application which queries this data using WCF? Potentially, I see issues with the maximum buffer size and timeout errors. We may need the entire data to be used for visualization purposes.
Please guide me if there is a practical solution to this problem.

Moving 3GB to a client is not going to work.
for visualization purposes.
Better prepare the visualization server-side. That will be slow enough.

Generally in this sort of situation if you need to view individual records then you would use a paging strategy. So your call to WCF would be for a page worth of records and you would display those records and the user would click on a next / previous button or some such.
As for the visualisation you should look to perform some transformation / reduction on the server as 2.5 million records is akin to displaying one data point per pixel on your screen.

First of all, have a look here.
Transfering 3GB of data from Disk to Disk can take quite a few minutes let alone on crossing across the network. I think you have got bigger fishes to fry - WCF limitation is irrelevant here.
So let's assume after a few minutes/hours you got the data across teh wire, where do you store it? You Silverlight app if running inside the browser can not grow to 3GB (even on a 64bit machine) and even it could, it does not make any sense. Especialy that amount of data when transformed into objects will take a lot more space.
Here is what I would do:
Get the server to provide snapshots/views of the data that is useful, e.g. providing summary, OLAP cubes, ...
For each record, provide minimum data required.
If you need detail on each record, do that in a separate call

Well, I believe and suggest that you're not going to show 2,5 milion rows in the same listing.
If you develop a good paging of data and the way you query the data is optimal, I don't find the problem with WCF.
I'm agree with querying data with a WCF interface is less efficient than a standalone, direct access to infraestructure solution, but if you need to host some business and data and N clients to access that in a SOA solution, or it's a client-server solution, you'll need to be sure that your queries are efficient.
Suggestions:
Use an OR/M. NHibernate will be your best choice, since it has a lot of ways of tweaking performance and paging is made easy because of it's LINQ support through QueryOver API in NHibernate 3.0. This product has a very interesting caching scheme and it'll let your application efficiently visualize your 2,5 milion-rows database.
Do caching. NHibernate may help you in this area, but think about that and, depending on the client technology (Web, Windows...), you'll find good options for caching presentation views (ASP.NET output caching, for example).
Think about how you're going to serialize objects in WCF: SOAP or JSON? Maybe you would be interested in JSON because serialized objects are tiny enough in order to save network trafic.
If you have questions, just comment out!

Ok, after many users talk about what you do there technically - what is the sense someone without thinking thought you have there?
2.5 million rows make no sensein a grid. Zero. Showing 80 rows per page (wide sdcreen, tilted 90 degree) that would be 31250 pages worth of data. You can not even scripp to a specific page. Ignoring load times -even IF (!) you load that etc., it just makes no sense to have this amount ina grid. Filter it down, then load what you need page wise. But the key here is to force the user to filter BEFORE even thinking about a grid. And once you ahve them, lets not get into takling abuot the performance of the grid.
To show you how bad this is. For get the grid. If you assign ONE PIXEL or every data item, you take 1.33 screens of 1024*768 pixels to show the data. THis is one pixel per item.
So, at the end of the day, even IF (which is impossible) to manage to get this working, you end up with a non sensical / non usable applciation.

MongoDB relationships for objects

Please excuse my english, I'm still trying to master it.
I've started to learn MongoDB (coming from a C# background) and I like the idea of what is MongoDB. I have some issues with examples on the internet.
Take the popular blog post / comments example. Post has none or many Comments associated with it. I create Post object, add a few Comment objects to the IList in Post. Thats fine.
Do I add that to just a "Posts" Collection in MonoDB or should I have two collections - one is blog.posts and blog.posts.comments?
I have a fair complicated object model, easiest way to think of it is as a Banking System - ours is mining. I tried to highlight tables with square brackets.
[Users] have one or many [Accounts] that have one or many [Transactions] which has one and only one [Type]. [Transactions] can have one or more [Tag] assigned to the transaction. [Users] create their own [Tags] unique to that user account and we sometimes need to offer reporting by those tags (Eg. for May, tag drilling-expense was $123456.78).
For indexing, I would have thought seperating them would be good but I'm worried it is bad practice this thinking from old RBDMS days.
In a way, its like the blog example. I'm not sure if I should have 1 [Account] Collection and persist all information there, or have an intermediate step that splits it up to seperate collections.
The other related query is, when you persist back and forth, do you usually get back everything associated with that record - even if not required or do you limit?

It depends.
It depends on how many of each of these type of objects you expect to have. Can you fit them all into a single MongoDB document for a given User? Probably not.
It depends on the relationships - is user-Account a one-to-many or a many-to-many relationship? If it's one to many and the number of Accounts is small you might chose to put them in an IList on a User document.
You can still model relationships in MongoDB with separate collections BUT there are no joins in the database so you have to do that in code. Loading a User and then loading their Accounts might be just fine from a performance perspective.
You can index INTO arrays on documents. Don't think of an Index as just being an index on a simple field on a document (like SQL). You can use, say, a Tag collection on a document and index into the tags. (See http://www.mongodb.org/display/DOCS/Indexes#Indexes-Arrays)
When you retrieve or write data you can do a partial read and a partial write of any document. (see http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields)
And, finally, when you can't see how to get what you want using collections and indexes, you might be able to achieve it using map reduce. For example, to find all the tags currently in use sorted by their frequency of use you would map each document emitting the tags used in it, and then you would reduce that set to get the result you want. You might then store the result of that map reduce permanently and only up date it when you need to.
One further concern: You mention calculating totals by tag. If you want accounting-quality transactional consistency, MongoDB might not be the right choice for you. "Eventual-consistency" is the name of the game for NoSQL data stores and they generally aren't a good fit for financial transactions. For example, it doesn't matter if one user sees a blog post with 3 comments while another sees 4 because they hit different replica copies that aren't in sync yet, but for a financial report, that kind of consistency does matter - your report might not add up!

many records: create business objects 1 by 1 or all at the same time?

I have a table with about 10.000 records (plus detail data in 11 tables). I want to display the data in a browsable form (1st, previous & next, last buttons).
I thought about retrieving all the records, putting the data into a collection of business objects and binding the form to the collection. Thinking about it, it came to my mind that it could take a while and could possibly result in a lot of memory being used ...
So maybe I should just retrieve the first record and get the next one when requested? What Do you think?

If you're not using an OR/M like nHibernate or iBatis, let me suggest that now. I use nHibernate - but I hear great things about iBatis. They have batching and lazy loading built in and configurable. Plus, the next time a situation like this arises, you'll have this functionality in minutes thanks to a pre-established flexible data access layer.
If you're not going to go the route of an OR/M, I'd suggest one-by-one fetching until performance becomes an issue - why complicate things unnecessarily upfront? If performance becomes an issue, then consider batch fetching and caching.

If the data is used frequently, but does not change often, you could consider caching it on the application level. If you retrieve the data on form level, a probably good compromise between speed and memory consumption would be to get e.g. 11 records (current, 5 previous and 5 next), and if the user goes past the retrieved records, get the next 5 again etc.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.