proper design of web service and associated queries

proper design of web service and associated queries - c#

I am setting up a web service and i'm unsure about the most efficient design..
The web service is going to be called to populate a remote website with a list of people and their associated information. So, in the database there will be a "Person" table, then other tables for nicknames, addresses, phone numbers, etc. There may be 50 "People" returned for the page.
I would think I'd want to keep the # of web service calls to a minimum.. So is the best way to just have one call, GetPeople(), which returns all People and their information in one xml? Or would I want to maybe have GetPeople() return just a list of People, then have a GetPeopleInfo() call for each person?
Thanks in advance

Will you have an opportinuty to talk a bit more with the folks using the web service? The best answer would be depend on a number of factors, including how they are using it.
The simplest soution would be to create one web service method GetPeople() that executes a query joining the Person table with the rest of the tables and returnes all of the data in one flat table. If the client is just going to display all this information in a single list/table this would be the best option all around.
If, on the other hand, the client is going to generate a list of people and then have a click through to get more detailed information on a separate detail page, they might want a GetPeople() method that just returns the names/ids and a GetPeopleInfo() that pulls back the detail. If this isn't going to cause a performance problem on your system, this should be relatively straight forward.
I would probably build both - create a GetPeople() method that brings back all the data {as long as there isn't so much data that transportation becomes and issue} and a GetPersonInfo() method that allows they to pull back details on a specific person. This offers your client the greatest flexibility with not too much additional effort on your part.

You need to think about how your service is going to be used.
Will the users require all of that information most/all of the time? If so then it's appropriate for GetAllPeoplesDetails() to return everything
If they're going to only rarely need all the info then GetPeople() should return just a list of People.
If you're not sure then you could provide both GetAllPeoplesDetails() and GetPeople() and let the end user make that decision

Giving your users options especially when they are disperssed is your best option in my opinion. Having options such as:
GetPeopleList() : return list of people names
GetPeopleAll() : Get list of all people and all info
GetPersonInfo() : Get info of 1 person
Just my opinion but by giving your people a choice your making your application more usefull to them.

I think your GetPeople() call is appropriate. You can return a very lean result set and the end user can GetPeopleInfo() if they need it. If the result set is large, you may want to provide a paging option and result count in your method that allows the caller to pull back a certain amount of rows at a time.

XML is fine.
JSON is better.
Dave Ward at Encosia uses DataTransferObjects to serialize his "Person" objects.
I keep linking to this article because it's done so well, and this is a common problem.
Hope this helps, JP

Related

Filtering with Web API

I have an application with several Web API controllers and I now I have a requirement which is to be able to filter GET results by the object properties. I've been looking at using OData but I'm not sure if it's a good fit for a couple reasons:
The Web API controller does not have direct access to the DataContext, instead it gets data from our database through our "domain" layer so it has no visibility into our Entity Framework models.
Tying into the first item, the Web API deals with lightweight DTO model objects which are produced in the domain layer. This is effectively what hides the EF models. The issue here is I want these queries to be executed in our database but by the time the Web API method gets a collection from the domain layer all of the objects in the collection have been mapped to these DTO objects, so I don't see how the OData filter could possibly do it's job when the objects are once-removed from EF in this way.
This item may be the most important one: We don't really want to allow arbitrary querying against our Web API/Database. We just sort of want to leverage this OData library to avoid writing our own filters, and filter parsers/builders for every type of object that could be returned by one of our Web API endpoints.
Am I on the wrong track based on #3? If not, would we be able to use this OData library without significant refactoring to how our Web API and our EF interact?

I haven't had experience with OData, but from what I can see it's designed to be fed a Context and manages the interaction and returning of those models. I am definitely not a fan of returning Entities in any form to a client.
It's an ugly situation to be in, but when faced with this, my first course of action is to push back to the clients to justify their searching needs. The default request is almost always "Well, it would be nice to be able to search against everything." My answer to that is that I don't want to know what you want, I want to know what you need because I don't want to give you a loaded gun to shoot your own foot off with and then have you blame me because the system came grinding to a halt. Searching is a huge performance killer if it's too open-ended. It's hard to test for accuracy/relevance, and efficiently index for 100% of possible search cases when users only need 25% of those scenarios. If the client cannot tell you what searching they will need, and just want everything because they might need it, then they don't need it yet.
Personally I stick to specific search DTOs and translate those into the linq expressions.
If I was faced with a hard requirement to implement something like that, I would:
Try to push for these searches/reports to be done off a reporting replica that is synchronized with the live database. (To minimize the bleeding when some idiot managers fire up some wacky non-indexed search criteria so that it doesn't tie up the production DB where people are trying to do work.)
Create a new bounded DbContext specific for searching with separate entity definitions that only expose the minimum # of properties to represent search criteria and IDs.
Hook this bounded context into the API and OData. It will return "search results". When a user selects a search result, use the ID(s) against the API to load the applicable domain, or initiate an action, etc.
no. 1. is optional, a nice to have provided they can live with searches not "seeing" updated criteria until replicated. (I.e. a few seconds to minutes depending on replication strategy/size) Normally these searches are used for reporting-type queries so I'd push to keep these separate from the normal day-to-day searching options that users use. (I.e. an advanced search option or the like.)

List all arcgis layers info with ArcGIS.PCL

I'm using ArcGIS.PCL with C# to query information from an Arcgis server and REST web service. I know how to query a specific layer to see all the fields and information in general about it. But how can I query the server to return the list of layers?
I can use this URL for a specific layer (id=0): http://server/arcgis/rest/services/myassets/assets/MapServer/0
but if I don't know the ID of the layer, what can I do to iterate through all of them?
I know I can use this URL: http://server/arcgis/rest/services/myassets/assets/MapServer/ and the server returns all the information, but I don't know which method to use from this ArcGIS.PCL library to map the results to classes.
Also, if I query data from a specific layer and its fields, what are the parameters to use to return all the info of all the fields? At the moment I use "*" for outFields and "1=1" for the Where clause, but feels a bit hackish.
Anyone's got experience with this library?
Thanks!

There isn't a operation for that defined as of yet though there is still a way to do it. The test project has an example which will just map the result to a dictionary though you can just define your own type to do it too if you prefer.
If you want to get a collection of services for the site you can use DescribeSite.
Using * for outFields is correct if you want all the fields returned, otherwise you need to list the ones you want. Any where clause is needed as otherwise ArcGIS Server will throw an error so using 1=1 is the simplest way to get all data.

Using LINQ in C# for paginated results

I'm looking for a design solution for a pattern that I am going to have to repeat quite a lot throughout a website I am designing. It is going to be ASP.NET MVC front-end, with C# WCF web services connecting using NHibernate to SQL database.
It's a social networking site so imagine facebook here to get a conceptual idea. What I'm looking for is an efficient and performant way to return paginated results of large datasets, for example a user may have 150 emails. I want to return them 10 at a time depending on what page theyre on, obviously only returning the 10 that relate to the page rather than having to load all 150 items into memory and only displaying 10 at a time as I think the user experience would be better to have a slightly longer delay in changing pages compared to a faster initial load. After all when do you look at emails 6 months old? The usual case is you only care about the first page of results anyway. Similarly a user may have had a number of interactions since their last login (eg your notifications feed on facebook) but again I only want to load n number of results at a time, but in this instance rather than having pages, you would click the "Display more" button which would then fetch the next N results, display them with another "display more" link and so forth you can keep clicking until you reach the end of the dataset. I can imagine they would both use the same design though as they are technically both paginated results, just with different UI output and flow.
Can anyone offer some advice on a good design to use for this, bearing in mind my data retrieval is using NHibernate Queryable or Enumerables? Would I want to be loading all data from DB in one hit then using an interator pattern to only return N rows from the service layer, keeping the rest of the list held in memory on the server open in the users session context so if I made another call to retrieve the next N rows, it would be held in place and keep returning N rows until the iterator finished, or would it be best to simply retrieve N rows from the database and return those, holding nothing in session context? I can see how to return top 10 results from Queryable as
var results = (from email in emails where email.UserId = userId).Take(10);
But I'm not sure how efficient this is, is this the fastest way of doing it? And furthermore I don't see how to start at a certain position, this will always only return the first 10, not say the second 10, or third 10 etc.
So I'm a bit unsure how the best way to proceed is and was hoping for some pointers and advice from people who have done something similar. Bearing in mind with my website performance is going to be of the essence so the user experience needs to be pretty sharp and interactive with refreshing new results. Basically if you were trying to simulate a facebook news feed/wall - how would you implement it with the above architecture?
Thanks!

You can use Skip in combination with Take:
var results = (from email in emails where email.UserId = userId)
.Skip((currentPage - 1) * 10)
.Take(10);
About the web service: You really should make it a stateless web service. You could use the ASP.NET Web API for this. This enables you to build a RESTful web service.

Do I want to be loading all the in one hit...
Definitely not, you only want to pull down the records you need, not the ones you may need.
...using an interator pattern to only return N rows from the service layer, keeping the rest of the list held in memory on the server open in the users session context...
Scalability goes right out the window with that idea.
...or would it be best to simply retrieve N rows from the database and return those, holding nothing in session context?
Now your starting to get on the right track...
In general, you want to let the database do as much as the querying as possible i.e. you don't want to hit the database to then have to further query the results (however, that's not always avoidable). In other words, you want to delegate most, if not all, the heavy lifting to the database.
You mentioned you are using NHibernate which is a pretty powerful ORM. The good news is that do a lot of the work for you in terms of query optimization/caching data etc. Like most ORM's nowadays, NHibernate uses deferred execution with it's queries so just watch out for things like hitting the database too early & choosing when to eager load data instead of performing multiple queries. There is a lot to learn with NHibernate, if you haven't already, it's worth taking the time to read up about it before diving in it will save you a lot of hassle in the long run.
Bearing in mind with my website performance is going to be of the essence so the user experience needs to be pretty sharp and interactive with refreshing new results
In terms of the performance (I assume you mean page load speeds) you would just want to ajaxify your site i.e. load what needs to be loaded with the page, pull the rest in the background & update the page dynamically. To achieve the "refreshing new results" part you need to look at polling the server and pulling down new data. I am pretty sure Facebook use a technique called long polling which essentially keeps an active request open with the server for a set amount of time so the data appears to happen "instantly". Polling is a different ball game all together though, it's about striking the balance of server load vs how "fresh" the data needs to be - that's something you would need to decide yourself and the answer to that is usually dependant on the type of data vs the hardware capabilities of the server.

There are some links about it (like this) out there but I liked this guy approach. I don't know if I'd use his PagedQueryable, but his IPageable, IPagedEnumerable and PagedEnumerable are really interesting. Besides, his project introduction page may give you some ideas on how to roll your own pagination.

How to deal with large objects?

I have 5 types of objects: place info (14 properties),owner company info (5 properties), picture, ratings (stores multiple vote results), comments.
All those 5 objects will gather to make one object (Place) which will have all the properties and information about all the Place's info, pictures, comments, etc
What I'm trying to achieve is to have a page that displays the place object and all it's properties. another issue, if I want to display the Owner Companies' profiles I'll have object for each owner company (but I'll add a sixth property which is a list of all the places they own)
I've been practicing for a while, but I never got into implementing and performance experience, but I sensed that it was too much!
What do you think ?

You have to examine the use case scenarios for your solution. Do you need to always show all of the data, or are you starting off with displaying only a portion of it? Are users likely to expand any collapsed items as part of regular usage or is this information only used in less common usages?
Depending on your answers it may be best to fetch and populate the entire page with all of the data at once, or it may be the case that only some data is needed to render the initial screen and the rest can be fetched on-demand.
In most cases the best solution is likely to involve fetching only the required data and to update the page dynamically using ajax queries as needed.
As for optimizing data access, you need to strike a balance between the number of database requests and the complexity of each individual request. Because of network latency it is often important to fetch as much as possible using as few queries as possible, even if this means you'll sometimes be fetching data that you do not always need. But if you include too much data in a single query, then computing all the joins may also be costly. It is quite rare to see a solution in which it is better to first fetch all root objects and then for every element go fetch some additional objects associated with that element. As such, design your solution to fetch all data at once, but include only what you really need and try to keep the number of involved tables to a minimum.

You have 3 issues to deal with really, and they are often split into DAL, BLL and UI
Your objects obviously belong in the BLL and if you're considering performance then you need to consider how your objects will be created and how they interface to the DAL. I have many objects with 50-200 properties so 14 properties is really no issue.
The UI side of it is seperate, and if you're considering the performance of displaying a lot of information onto a single page you'll consider tabbed content, grids etc.
Tackle it one thing at a time and see where your problems lie.

Caching Active Directory Data

In one of my applications, I am querying active directory to get a list of all users below a given user (using the "Direct Reports" thing). So basically, given the name of the person, it is looked up in AD, then the Direct Reports are read. But then for every direct report, the tool needs to check the direct reports of the direct reports. Or, more abstract: The Tool will use a person as the root of the tree and then walk down the complete tree to get the names of all the leaves (can be several hundred)
Now, my concern is obviously performance, as this needs to be done quite a few times. My idea is to manually cache that (essentially just put all the names in a long string and store that somewhere and update it once a day).
But I just wonder if there is a more elegant way to first get the information and then cache it, possibly using something in the System.DirectoryServices Namespace?

In order to take control over the properties that you want to be cached you can call 'RefreshCache()' passing the properties that you want to hang around:
System.DirectoryServices.DirectoryEntry entry = new System.DirectoryServices.DirectoryEntry();
// Push the property values from AD back to cache.
entry.RefreshCache(new string[] {"cn", "www" });

Active Directory is pretty efficient at storing information and the retrieval shouldn't be that much of a performance hit. If you are really intent on storing the names, you'll probably want to store them in some sort of a tree stucture, so you can see the relationships of all the people. Depending on how the number of people, you might as well pull all the information you need daily and then query all the requests against your cached copy.

AD does that sort of caching for you so don't worry about it unless performance becomes a problem. I have software doing this sort of thing all day long running on a corporate intranet that takes thousands of hits per hour and have never had to tune performance in this area.

Depends on how up to date you want the information to be. If you must have the very latest data in your report then querying directly from AD is reasonable. And I agree that AD is quite robust, a typical dedicated AD server is actually very lightly utilised in normal day to day operations but best to check with your IT department / support person.
An alternative is to have a daily script to dump the AD data into a CSV file and/or import it into a SQL database. (Oracle has a SELECT CONNECT BY feature that can automatically create multi-level hierarchies within a result set. MSSQL can do a similar thing with a bit of recursion IIRC).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.