I have a Module class, a User, a UserModule and a UserModuleLevel class.
_module_objects is a static ObservableCollection of Modules and gets created when the program starts, there's about 10 of them. e.g. User Management, Customer Services, etc.
User as you can probably guess is user details: ID, Name, etc. Populated from a db query.
With UserModules, I do not keep the module information in the db, just the module level, which is just the module security levels. this is kept in the db as: User_ID, Module_ID, ModuleLevel, ModuleLevelAccess.
What I'm trying to do is populate an ObservableCollection of users in the fastest manner. I have about 120,000 users, usually these users only have access to 2 or 3 of the 10 modules.
Below is what I have tried so far, however the piece with asterisks around it is the bottle neck, because it is going through every module of every user.
Hoping for some advice to speed things up.
public class UserRepository
{
ObservableCollection<User> m_users = new ObservableCollection<User>();
public UserRepository(){}
public void LoadUsers()
{
var users = SelectUsers();
foreach (var u in users)
{
m_users.Add(u);
}
}
public IEnumerable<User> SelectUsers()
{
var userModulesLookup = GetUserModules();
var userModuleLevelsLookup = GetUserModuleLevels().ToLookup(x => Tuple.Create(x.User_ID, x.Module_ID));
clsDAL.SQLDBAccess db = new clsDAL.SQLDBAccess("DB_USERS");
db.setCommandText("SELECT * FROM USERS");
using (var reader = db.ExecuteReader())
{
while (reader.Read())
{
var user = new User();
var userId = NullSafeGetter.GetValueOrDefault<int>(reader, "USER_ID");
user.User_ID = userId;
user.Username = NullSafeGetter.GetValueOrDefault<string>(reader, "USERNAME");
user.Name = NullSafeGetter.GetValueOrDefault<string>(reader, "NAME");
user.Job_Title = NullSafeGetter.GetValueOrDefault<string>(reader, "JOB_TITLE");
user.Department = NullSafeGetter.GetValueOrDefault<string>(reader, "DEPARTMENT");
user.Company = NullSafeGetter.GetValueOrDefault<string>(reader, "COMPANY");
user.Phone_Office = NullSafeGetter.GetValueOrDefault<string>(reader, "PHONE_OFFICE");
user.Phone_Mobile = NullSafeGetter.GetValueOrDefault<string>(reader, "PHONE_MOBILE");
user.Email = NullSafeGetter.GetValueOrDefault<string>(reader, "EMAIL");
user.UserModules = new ObservableCollection<UserModule>(userModulesLookup);
//**************** BOTTLENECK **********************************
foreach (var mod in user.UserModules)
{
mod.UserModuleLevels = new ObservableCollection<UserModuleLevel>(userModuleLevelsLookup[Tuple.Create(userId, mod.Module.Module_ID)]);
}
//**************************************************************
yield return user;
}
}
}
private static IEnumerable<Users.UserModule> GetUserModules()
{
foreach (Module m in ModuleKey._module_objects)
{
//Set a reference in the UserModule to the original static module.
var user_module = new Users.UserModule(m);
yield return user_module;
}
}
private static IEnumerable<Users.UserModuleLevel> GetUserModuleLevels()
{
clsDAL.SQLDBAccess db_user_module_levels = new clsDAL.SQLDBAccess("DB_USERS");
db_user_module_levels.setCommandText(#"SELECT * FROM USER_MODULE_SECURITY");
using (var reader = db_user_module_levels.ExecuteReader())
{
while (reader.Read())
{
int u_id = NullSafeGetter.GetValueOrDefault<int>(reader, "USER_ID");
int m_id = NullSafeGetter.GetValueOrDefault<int>(reader, "MODULE_ID");
int ml_id = NullSafeGetter.GetValueOrDefault<int>(reader, "MODULE_LEVEL_ID");
int mla = NullSafeGetter.GetValueOrDefault<int>(reader, "MODULE_LEVEL_ACCESS");
yield return new Users.UserModuleLevel(u_id, m_id, ml_id, mla);
}
}
}
}
In the end I'll put the users into a DataGrid with module security displayed, buttons with green show there is some type of access to this module, clicking on it will bring up actual security settings.
For performance gains you can do a few things:
Change your data access code to perform JOINs in SQL to get your data as a single result set.
SQL tends to be a fair bit faster at returning a result set of relational data than C# is at glueing the data together after the fact. This is because it's optimised to do just that and you should take advantage of that
You should probably consider paging the results - any user that says they need all 120,000 results at once should be slapped upside the head with a large trout. Paging the results will limit the amount of processing that you need to do in the application
Doing the above can be quite daunting as you would need to modify your application to include paging - often 3rd party controls such as grids etc have some paging mechanisms built in, and these days most ORM software has some sort of paging support which translates your C# code to the correct dialect for your chosen RDBMS
A good example (I've been working with a bit lately) is ServiceStack OrmLite.
I believe it to be free as long as you are using the legacy V3 version (which is pretty darn good .. https://github.com/ServiceStackV3/ServiceStackV3) and I've seen some forks of it on GitHub which are currently maintained (http://www.nservicekit.com/)
There is a small learning curve, but nothing the examples/docs can't tell you
Here's an extension method I'm using to page my queries in my service layer:
public static SqlExpressionVisitor<T> PageByRequest<T>(this SqlExpressionVisitor<T> expr, PagedRequest request)
{
return expr.Limit((request.PageNumber - 1) * request.PageSize, request.PageSize);
}
The request contains the page number and page size (from my web app), and the Limit extension method in OrmLite does the rest. I should probably add that the <T> generic parameter is the object type that OrmLite will map to after it has queried.
Here's an example of that (its just a POCO with some annotations)
[Alias("Customers")]
public class Customer : IHasId<string>
{
[Alias("AccountCode")]
public string Id { get; set; }
public string CustomerName { get; set; }
// ... a load of other fields
}
The method is translated to T-SQL and results in the following query against the DB (for this example I selected page 4 on my customer list with a page size of 10):
SELECT <A big list of Fields> FROM
(SELECT ROW_NUMBER() OVER (ORDER BY AccountCode) As RowNum, * FROM "Customers")
AS RowConstrainedResult
WHERE RowNum > 40 AND RowNum <= 50
This keeps the query time down to way less than a second and ensures I don't need to write a shedload of vendor specific SQL
It really depends on how much application you have already got - if you are too far in, it may be a nightmare to refactor for an ORM, but it's worth considering for other projects
Related
I'm not sure External Source is the correct phrasing, but essentially I have a view in my database that points to a table in a different database. Not always, but from time to time I get an ORA-12537 Network Session: End of File exception. I'm using Entity Framework, so I tried breaking it up so instead of using one massive query, it does a handful of queries to generate the final result. But this has had a mixed-to-no impact.
public List<SomeDataModel> GetDataFromList(List<string> SOME_LIST_OF_STRINGS)
{
var retData = new List<SomeDataModel>();
const int MAX_CHUNK_SIZE = 1000;
var totalPages = (int)Math.Ceiling((decimal)SOME_LIST_OF_STRINGS.Count / MAX_CHUNK_SIZE);
var pageList = new List<List<string>>();
for(var i = 0; i < totalPages; i++)
{
var chunkItems = SOME_LIST_OF_STRINGS.Skip(i * MAX_CHUNK_SIZE).Take(MAX_CHUNK_SIZE).ToList();
pageList.Add(chunkItems);
}
using (var context = new SOMEContext())
{
foreach(var pageChunk in pageList)
{
var result = (from r in context.SomeEntity
where SOME_LIST_OF_STRINGS.Contains(r.SomeString)
select r).ToList();
result.ForEach(x => retData.Add(mapper.Map<SomeDataModel>(x)));
}
}
return retData;
}
I'm not sure if there's a different approach to dealing with this exception or not, or if breaking up the query has any desired effect. It's probably worth noting that SOME_LIST_OF_STRINGS is pretty large (about 21,000 on average), so totalPages usually sits around 22.
Sometimes, that error can be caused by an excessively large "IN" list in the SQL. For example:
SELECT *
FROM tbl
WHERE somecol IN ( ...huge list of stuff... );
Enabling application or database level tracing could help reveal whether the SQL that's being constructed behind the scenes has a large IN list.
A workaround might be to INSERT "...huge list of stuff..." into a table and then use something similar to the query below in order to avoid the huge list of literals.
SELECT *
FROM tbl
WHERE somecol IN ( select stuff from sometable );
Reference*:
https://support.oracle.com/knowledge/More%20Applications%20and%20Technologies/2226769_1.html
*I mostly drew my conclusions from the part of this reference that's not publicly viewable.
I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.
I tried searching but couldn't find a proper answer.
I am creating an application that contains a lot of different objects. The data for these objects is saved in an MSSQL database. What is the best way to get data out?
For simplicity I will use two objects here:
ItemObject
UserObject
Both of them has a constructor which will get data from the Database:
public ItemObject(int ID) //same one for UserObject
{
//Code to get the data from the database for this particular item
}
ItemObject has a property called CreatedBy which is a UserObject.
Now the question is what is the best way to create the ItemObject?
I have two possible solutions:
Solution #1:
public ItemObject(int ID)
{
DataTable dt = dal.GetDataTable("SELECT TOP 1 * FROM Items WHERE ID = #ID")
this.CreatedBy = new UserObject((int)dt.rows[0]["UserID"])
}
Solution #2
public ItemObject(int ID)
{
DataTable dt = dal.GetDataTable("SELECT TOP 1 * FROM Items INNER JOIN Users ON Items.CreatedBy = Users.ID WHERE Items.ID = #ID")
this.CreatedBy = new UserObject((int)dt.rows[0]["UserID"], dt.rows[0]["Username"].ToString())
}
public UserObject(int ID, string Username)
{
this.ID = ID;
this.Username = Username;
}
In solution #1 I ask for data twice but in solution #2 I ask for data once. Although solution #1 is much "cleaner" and easier to read.
Edited after Steves correction.
I would go with solution two. From my point of view solution 1 is not acceptable, though it is "cleaner".
And I think there is no best practice for reading to objects. I like much Entity Framework for this purpose.
A third party application creates one database per project. All the databases have the same tables and structure. New projects may be added at anytime so I can't use any EF schema.
What I do now is:
private IEnumerable<Respondent> getListRespondentWithStatuts(string db)
{
return query("select * from " + db + ".dbo.respondent");
}
private List<Respondent> query(string sqlQuery)
{
using (var sqlConx = new SqlConnection(Settings.Default.ConnectionString))
{
sqlConx.Open();
var cmd = new SqlCommand(sqlQuery, sqlConx);
return transformReaderIntoRespondentList(cmd.ExecuteReader());
}
}
private List<Respondent> transformReaderIntoRespondentList(SqlDataReader sqlDataReader)
{
var listeDesRépondants = new List<Respondent>();
while (sqlDataReader.Read())
{
var respondent = new Respondent
{
CodeRépondant = (string)sqlDataReader["ResRespondent"],
IsActive = (bool?)sqlDataReader["ResActive"],
CodeRésultat = (string)sqlDataReader["ResCodeResult"],
Téléphone = (string)sqlDataReader["Resphone"],
IsUnContactFinal = (bool?)sqlDataReader["ResCompleted"]
};
listeDesRépondants.Add(respondent);
}
return listeDesRépondants;
}
This works fine, but it is deadly slow (20 000 records per minutes). Do you have any hints on what strategy should be faster? For info, what is really slow is transformReaderIntoRespondentList method
Thanks!!
Generally speaking anything SELECT * FROM is bad practice, but it could also be resulting in you having to pull back more data than is actually required. The transform is operating on only a few columns are more columns than required being returned? Consider replacing with:
private IEnumerable<Respondent> getListRespondentWithStatuts(string db)
{
return query("select ResRespondent, ResActive, ResCodeResult, Resphone, ResCompleted from " + db + ".dbo.respondent");
}
Also, gaurd against SQL-Injection attacks; concating strings for SQL queries is very dangerous.
When pulling data from a DataReader, I find that using the non-named lookups work best:
var respondent = new Respondent
{
CodeRépondant = sqlDataReader.GetString(0),
IsActive = sqlDataReader.IsDBNull(1) ? (Boolean?)null : sqlDataReader.GetBoolean(1),
CodeRésultat = sqlDataReader.GetString(2),
Téléphone = sqlDataReader.GetString(3),
IsUnContactFinal = sqlDataReader.IsDBNull(4) ? (Boolean?)null : sqlDataReader.GetBoolean(4)
};
I have not explcicitly tested the performance difference in a long while; but that used to make a notable difference. The ordinal checks did not have to do a named lookup and also avoided boxing/unboxing values.
Other than that, without more info it is hard to say... do you need all 20,000 records?
UPDATE
Ran a simple local test case with 300,000 records and reduced the time to load all data by almost 50%. I imagine these results will vary depending on the type of data being retrieved; but it still does make a difference on overall execution time. That being said, in my environment we are talking a drop from 650ms to just over 300ms.
NOTE
If respondent is a view, what is likely "really slow" is the database building up the result set; although the data reader will start processing information as soon as records are available, the ultimate bottleneck will be the database itself and/or network latency. Other than the above optimizations, there is not going to be much that you can do with your code unless you can index the view/table to optimize the query and or reduce the information required.
I have a DB like this that I generated from EF:
Now I'd like to add a "fielduserinput" entity so I write the following code:
public bool AddValueToField(string field, string value, string userId)
{
//adds a value to the db
var context = new DBonlyFieldsContainer();
var fieldSet = (from fields in context.fieldSet
where fields.fieldName.Equals(field)
select fields).SingleOrDefault();
var userSet = (from users in context.users
where users.id.Equals(userId)
select users).SingleOrDefault();
var inputField = new fielduserinput { userInput = value, field = fieldSet, user = userSet };
return false;
}
Obviously it's not finished but I think it conveys what I'm doing.
Is this really the right way of doing this? My goal is to add a row to fielduserinput that contains the value and references to user and field. It seems a bit tedious to do it this way. I'm imagining something like:
public bool AddValueToField(string userId, string value, string fieldId)
{
var context = new db();
var newField = { field.fieldId = idField, userInput = value, user.id = userId }
//Add and save changes
}
For older versions of EF, I think you're doing more or less what needs to be done. It's one of the many reasons I didn't feel EF was ready until recently. I'm going to lay out the scenario we have to give you another option.
We use the code first approach in EF 4 CTP. If this change is important enough, read on, wait for other answers (because Flying Speghetti Monster knows I could be wrong) and then decide if you want to upgrade. Keep in mind it's a CTP not an RC, so considerable changes could be coming. But if you're starting to write a new application, I highly recommend reading some about it before getting too far.
With the code first approach, it is possible to create models that contain properties for a reference to another model and a property for the id of the other model (User & UserId). When configured correctly setting a value for either the reference or the id will set the id correctly in the database.
Take the following class ...
public class FieldUserInput{
public int UserId {get;set;}
public int FieldId {get;set;}
public virtual User User {get;set;}
public virtual Field Field {get;set;}
}
... and configuration
public class FieldUserInputConfiguration{
public FieldUserInputConfiguration(){
MapSingleType(fli => new {
userid = fli.UserId,
fieldid = fli.FieldId
};
HasRequired(fli => fli.User).HasConstraint((fli, u)=>fli.UserId == u.Id);
HasRequired(fli => fli.Field).HasConstraint((fli, f)=>fli.FieldId == f.Id);
}
}
You can write the code...
public void CreateField(User user, int fieldId){
var context = new MyContext();
var fieldUserInput = new FieldUserInput{ User = user, FieldId = fieldId };
context.FieldUserInputs.Add(fieldUserInput);
context.SaveChanges();
}
... or vice versa with the properties and everything will work out fine in the database. Here's a great post on full configuration of EF.
Another point to remember is that this level of configuration is not necessary. Code first is possible to use without any configuration at all if you stick to the standards specified in the first set of posts referenced. It doesn't create the prettiest names in the database, but it works.
Not a great answer, but figured I'd share.