String concatenation optimisation

String concatenation optimisation - c#

We're currently using LINQ to generate SQL queries, with a bit of magic inside to handle case-specific queries.
Up until now, it's worked fine; very fast, hardly any issues. We've recently run into efficiency issues when querying a large amount of data from the database.
We construct the query as such:
var someIntList = new List<int> { 1,2,3,4,5 };
var query = dtx.Query.Containers.Where(c => c.ContainerID.IsIn(someIntList));
or
var someStringList = new List<int> {"a", "b", "c" };
query = dtx.Query.Containers.Where(c => c.BuildingName.IsIn(someStringList));
Which would generate (along with a bunch of other stuff which isn't related to this):
SELECT * FROM Container WHERE ContainerID IN (1,2,3,4,5)
and
SELECT * FROM Container WHERE BuildingName IN ('a','b','c')
Now in this particular situation, we need to return 50,000 rows .. which is generated through 5 seperate queries, splitting up the load.
The DB returns fairly quickly (within seconds), however generating the query takes a long time.
Here's the very last function which is called to generate this particular query:
private static string GetSafeValueForItem(object item)
{
if (item == null)
return "NULL";
if (item is bool)
return ((bool)item ? "1" : "0");
if (item is string)
return string.Format("'{0}'", item.ToString().Replace("'", "''"));
if (item is IEnumerable)
return ListToDBList((IEnumerable)item);
if (item is DateTime)
return string.Format("'{0}'", ((DateTime)item).ToString("yyyy-MM-dd HH:mm:ss"));
return item.ToString();
}
private static string ListToDBList(IEnumerable list)
{
var str = list.Cast<object>().Aggregate("(", (current, item) => current + string.Format("{0},", GetSafeValueForItem(item)));
str = str.Trim(',');
str += ")";
return str;
}
Are there any obvious improvements which can be made to speed up the string concatenation in this case? Refactoring the code and using a different implementation (such as avoiding the query generating and hitting the database directly) is not preferred, but if it offered a big performance boost would be great to hear.

Your Aggregate code is basically string concatenation in a loop. Don't do that.
Options:
Use StringBuilder
Use string.Join

Here's an example using String.Join that outputs the same as your ListToDBList:
String.Format("({0})", String.Join(",", list.Cast<object>().Select(item=>GetSafeValueForItem(item)).ToArray()));
See here for an explanation why concatenating in a loop using + (which is what your call to Aggregate was doing) is slow: http://www.yoda.arachsys.com/csharp/stringbuilder.html

I haven't made test cases and profiled your code, so I don't know how much improvement you can expect.
Use a StringBuilder instead of String.Format and the += operator. The += operator is known to be slow. I suspect String.Format is going to be somewhat slow, too.
You could also try string.Join instead of manually joining the array. It works on IEnumerable in newer versions of the .NET framework (4.0?).

Not sure why you're doing list.Cast when a plain IEnumerable will be of objects anyway. But your whole ListToDBList can be replaced by
string.Format("({0})", string.Join(",",list.ToArray()));
Not sure how much quicker it would be, but it's clearer to my mind.

Related

How to handle null strings?

Say I have the following code:
Request.QueryString["ids"].Split('|');
If ids is not present in the query string this will throw an exception. Is there a generally accepted way to handle this type of situation. I think all of the following options would keep this from thorwing an error, but I'm wondering if one (or some different method entirely) is generally accepted as better.
string[] ids = (Request.QueryString["ids"] ?? "").Split('|');
or
string[] ids;
if(!String.IsNullOrEmpty(Request.QueryString["ids"]))
{
ids = Request.QueryString["ids"].Split('|')
}
or
?
I think all of these will work, but they look sort of ugly. Is there a better* way?
*better = easier to read, faster, more efficient or all of the above.

I like using an extension method for this:
public static string EmptyIfNull(this string self)
{
return self ?? "";
}
Usage:
string[] ids = Request.QueryString["ids"].EmptyIfNull().Split('|');

Personally I'd use
string idStr = Request.QueryString["ids"];
ids = idStr == null ? new string[0] : idStr.Split("|");

string[] ids = (Request.QueryString["ids"] as string).Split('|');
This will fail in the same manner as Request.QueryString["ids"]
string[] ids;
if(!String.IsNullOrEmpty(Request.QueryString["ids"]))
{
ids = Request.QueryString["ids"].Split('|')
}
Heavier and may call the data retrieval logic twice (and you might have side-effects done twice by error) => use a temporary to store the data but heavier.
string[] ids = (Request.QueryString["ids"] ?? "").Split('|');
Definetely the easiest, cleanest and more efficient way as the compiler will generate a temporary itself.
If you encounter this kind of processing a lot of time you can build your own plumbing library with a bunch of methods with fluent names and behaviors.

Linq-to-SQL: Combining (OR'ing) multiple "Contains" filters?

I'm having some trouble figuring out the best way to do this, and I would appreciate any help.
Basically, I'm setting up a filter that allows the user to look at a history of audit items associated with an arbitrary "filter" of usernames.
The datasource is a SQL Server data base, so I'm taking the IQueryable "source" (either a direct table reference from the db context object, or perhaps an IQueryable that's resulted from additional queries), applying the WHERE filter, and then returning the resultant IQueryable object....but I'm a little stumped as to how to perform OR using this approach.
I've considered going the route of Expressions because I know how to OR those, but I haven't been able to figure out quite how to do that with a "Contains" type evaluation, so I'm currently using a UNION, but I'm afraid this might have negative impact on performance, and I'm wondering if it may not give me exactly what I need if other filters (in addition to user name filtering shown here) are added in an arbirary order.
Here is my sample code:
public override IQueryable<X> ApplyFilter<X>(IQueryable<X> source)
{
// Take allowed values...
List<string> searchStrings = new List<string>();
// <SNIP> (This just populates my list of search strings)
IQueryable<X> oReturn = null;
// Step through each iteration, and perform a 'LIKE %value%' query
string[] searchArray = searchStrings.ToArray();
for (int i = 0; i < searchArray.Length; i++)
{
string value = searchArray[i];
if (i == 0)
// For first step, perform direct WHERE
oReturn = source.Where(x => x.Username.Contains(value));
else
// For additional steps, perform UNION on WHERE
oReturn = oReturn.Union(source.Where(x => x.Username.Contains(value)));
}
return oReturn ?? source;
}
This feels like the wrong way to do things, but it does seem to work, so my question is first, is there a better way to do this? Also, is there a way to do a 'Contains' or 'Like' with Expressions?
(Editted to correct my code: In rolling back to working state in order to post it, I apparently didn't roll back quite far enough :) )
=============================================
ETA: Per the solution given, here is my new code (in case anyone reading this is interested):
public override IQueryable<X> ApplyFilter<X>(IQueryable<X> source)
{
List<string> searchStrings = new List<string>(AllowedValues);
// <SNIP> build collection of search values
string[] searchArray = searchStrings.ToArray();
Expression<Func<X, bool>> expression = PredicateBuilder.False<X>();
for (int i = 0; i < searchArray.Length; i++)
{
string value = searchArray[i];
expression = expression.Or(x => x.Username.Contains(value));
}
return source.Where(expression);
}
(One caveat I noticed: Following the PredicateBuilder's example, an empty collection of search strings will return false (false || value1 || ... ), whereas in my original version, I was assuming an empty list should just coallesce to the unfiltered source. As I thought about it more, the new version seems to make more sense for my needs, so I adopted that)
=============================================

You can use the PredicateBuilder from the LINQkit to dynamically construct your query.

How does C# lambda work?

I'm trying to implement method Find that searches the database.
I forgot to mention that I'm using Postgresql, so I can't use built in LINQ to SQL.
I want it to be like that:
var user = User.Find(a => a.LastName == "Brown");
Like it's done in List class. But when I go to List's source code (thanks, Reflector), I see this:
public T Find(Predicate<T> match)
{
if (match == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
}
for (int i = 0; i < this._size; i++)
{
if (match(this._items[i]))
{
return this._items[i];
}
}
return default(T);
}
How can I implement this thing? I need to get those parameters to make the search.
Solution
Okay, I understood now that I need to do LINQ to SQL to do all this good expressions stuff, otherwise I'd have to spend a lot of time reimplementeing the wheel.
Since I can't use LINQ to SQL, I implemented this easy method:
public static User Find(User match, string orderBy = "")
{
string query = "";
if (!String.IsNullOrEmpty(match.FirstName)) query += "first_name='" + match.FirstName + "'";
if (!String.IsNullOrEmpty(match.LastName)) query += "last_name='" + match.LastName+ "'";
return Find(query + (!String.IsNullOrEmpty(orderBy) ? orderBy : ""));
}
This is how to use it:
var user = User.Find(new User { FirstName = "Bob", LastName = "Brown" });

Your method should accept Expression<Func<User>>.
This will give you expression tree instead of delegate which you can analyze and serialize to SQL or convert to any other API call your database have.
If you want everything to be generic, you may wish to go on with implementing IQueryable interface. Useful information can be found here: LINQ Tips: Implementing IQueryable Provider
Although for a simple scenario I would suggest not to complicate everything and stick with using Expression Trees and returning plain IEnumerable<T> or even List<T>.
For your case first version of code could look like this:
public IEnumerable<T> Get(Expression<Func<T, bool>> condition)
{
if (condition.Body.NodeType == ExpressionType.Equal)
{
var equalityExpression = ((BinaryExpression)condition.Body);
var column = ((MemberExpression)equalityExpression.Left).Member.Name;
var value = ((ConstantExpression)equalityExpression.Right).Value;
var table = typeof(T).Name;
var sql = string.Format("select * from {0} where {1} = '{2}'", table, column, value);
return ExecuteSelect(sql);
}
return Enumerable.Empty<T>();
}
And it's complexity grows fast when you want to handle new and new scenarios so make sure you have reliable unit tests for each scenario.
C# Samples for Visual Studio 2008 contain ExpressionTreeVisualizer that will help you to dig into Expression Trees more easily to understand how to extract information you need from it.
And of course, if you can stick with using existing implementation of LINQ, I would suggest to do it. There are Linq to SQL for SQL Server databases, Linq to Entities for many different databases, Linq to NHibernate for NHbernate projects.
Many other LINQ providers can be found here: Link to Everything: A List of LINQ Providers. Amount of work to implement LINQ provider is not trivial so it's a good idea to reuse tested and supported solution.

Exactly the same way. Just replace this._items with your users collection.
Also replace the type parameter T with the type User.

A lambda expression in source code can be converted to either a compiled executable delegate or an expression tree upon compilation. Usually we associate lambda's with delegates but in your case since you say you want access to the parameters (in this case I assume you mean LastName and "Brown" then you want an expression tree.
Once you have an expression tree, you can parse it to see exactly what it is an translate it to whatever you actually need to do.
Here are a few questions about expression trees.
Expression trees for dummies?
Bit Curious to understand Expression Tree in .NET
Sounds like you're definitely reinventing a very complicated wheel though. I'm sure it'll be a useful learning experience, but you should look into LINQ to Entities or LINQ to SQL for real-world programming.

Maybe I just haven't understood the question, but there's already a method for doing what you want: Enumerable.Where.
If you need to find a single element then use SingleOrDefault or FirstOrDefault instead.

You could do it something like this:
public static IEnumerable<User> Find(Predicate<User> match)
{
//I'm not sure of the name
using (var cn = new NpgsqlConnection("..your connection string..") )
using (var cmd = new NpgsqlCommand("SELECT * FROM Users", cn))
using (var rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
var user = BuildUserObjectFromIDataRecord(rdr);
if (match(user)) yield return user;
}
}
}
And then you can call it like this
var users = User.Find(a => a.LastName == "Brown");
Note that this returns any number of users, you still have to implement the BuildUserObjectFromIDataRecord() function, and that it will always want to iterate over the entire users table. But it gives you the exact semantics you want.

Okay, I understood now that I need to do LINQ to SQL to do all this good expressions stuff, otherwise I'd have to spend a lot of time reimplementeing the wheel.
Since I can't use LINQ to SQL, I implemented this easy method:
public static User Find(User match, string orderBy = "")
{
string query = "";
if (!String.IsNullOrEmpty(match.FirstName)) query += "first_name='" + match.FirstName + "'";
if (!String.IsNullOrEmpty(match.LastName)) query += "last_name='" + match.LastName+ "'";
return Find(query + (!String.IsNullOrEmpty(orderBy) ? orderBy : ""));
}
This is how to use it:
var user = User.Find(new User { FirstName = "Bob", LastName = "Brown" });

One way would be to create an anonymous delegate, like so:
Predicate<User> Finder = delegate(User user)
{
return user.LastName == "Brown";
}
var User = User.Find(Finder);

Is converting a NameValueCollection to a querystring using a c# lambda efficient?

In researching how to convert a NameValueCollection to a querystring, I have come across different methods. I am curious if the shorter lambda syntax is as efficient as it could be.
How to convert NameValueCollection to a (Query) String using a iterating function.
public static String ConstructQueryString(NameValueCollection parameters)
{
List<String> items = new List<String>();
foreach (String name in parameters)
items.Add(String.Concat(name, "=", System.Web.HttpUtility.UrlEncode(parameters[name])));
return String.Join("&", items.ToArray());
}
Join a NameValueCollection into a querystring in C# uses a lambda expression, which looks nice but I'm not sure if it is efficient code.
private static string JoinNvcToQs(NameValueCollection qs)
{
return string.Join("&", Array.ConvertAll(qs.AllKeys, key => string.Format("{0}={1}", HttpUtility.UrlEncode(key), HttpUtility.UrlEncode(qs[key]))));
}

I would do it like this:
public static string ConstructQueryString(NameValueCollection parameters)
{
var sb = new StringBuilder();
foreach (String name in parameters)
sb.Append(String.Concat(name, "=", System.Web.HttpUtility.UrlEncode(parameters[name]), "&"));
if (sb.Length > 0)
return sb.ToString(0, sb.Length - 1);
return String.Empty;
}
This way you create less objects (that have to be cleaned up by the garbage collector)

First of all, the best thing you can do is test and see if the performance is acceptable for your application, we can tell you generalities about performance but in the end it comes down to your needs and only you know the answers to that.
As to the question at hand, any time you use a delegate (which is what a lambda creates) rather than executing the code directly you'll take a performance hit. In most cases the hit is acceptable but if this code needs the absolute best possible performance (say it's in an inner loop) then you need to go with your first method.
That said, if you're creating a querystring, presumably you're about to hit the database which will likely take considerably longer than either method of creating the querystring in the first place.

NameValueCollection's ToString method will build the query string for you. I haven't done any benchmarking, but I'd imagine the implementation would be more efficient than something using lambdas or foreach.
(The ToString solution doesn't seem to be well-documented; I only found it because this answer used it in a code sample.)

Concatenate collection of XML tags to string with LINQ

I'm stuck with using a web service I have no control over and am trying to parse the XML returned by that service into a standard object.
A portion of the XML structure looks like this
<NO>
<L>Some text here </L>
<L>Some additional text here </L>
<L>Still more text here </L>
</NO>
In the end, I want to end up with one String property that will look like "Some text here Some additional text here Still more text here "
What I have for an initial pass is what follows. I think I'm on the right track, but not quite there yet:
XElement source = \\Output from the Webservice
List<IndexEntry> result;
result = (from indexentry in source.Elements(entryLevel)
select new IndexEntry()
{
EtiologyCode = indexentry.Element("IE") == null ? null : indexentry.Element("IE").Value,
//some code to set other properties in this object
Note = (from l in indexentry.Elements("NO").Descendants
select l.value) //This is where I stop
// and don't know where to go
}
I know that I could add a ToList() operator at the end of that query to return the collection. Is there an opertaor or technique that would allow me to inline the concatentation of that collection to a single string?
Feel free to ask for more info if this isn't clear.
Thanks.

LINQ to XML is indeed the way here:
// Note: in earlier versions of .NET, string.Join only accepts
// arrays. In more modern versions, it accepts sequences.
var text = string.Join(" ", topElement.Elements("L").Select(x => x.Value));
EDIT: Based on the comment, it looks like you just need a single-expression way of representing this. That's easy, if somewhat ugly:
result = (from indexentry in source.Elements(entryLevel)
select new IndexEntry
{
EtiologyCode = indexentry.Element("IE") == null
? null
: indexentry.Element("IE").Value,
//some code to set other properties in this object
Note = string.Join(" ", indexentry.Elements("NO")
.Descendants()
.Select(x => x.Value))
};
Another alternative is to extract it into a separate extension method (it has to be in a top-level static class):
public static string ConcatenateTextNodes(this IEnumerable<XElement> elements) =>
string.Join(" ", elements.Select(x => x.Value));
then change your code to:
result = (from indexentry in source.Elements(entryLevel)
select new IndexEntry
{
EtiologyCode = indexentry.Element("IE") == null
? null
: indexentry.Element("IE").Value,
//some code to set other properties in this object
Note = indexentry.Elements("NO")
.Descendants()
.ConcatenateTextNodes()
}
EDIT: A note about efficiency
Other answers have suggested using StringBuilder in the name of efficiency. I would check for evidence of this being the right way to go before using it. If you think about it, StringBuilder and ToArray do similar things - they create a buffer bigger than they need to, add data to it, resize it when necessary, and come out with a result at the end. The hope is that you won't need to resize too often.
The difference between StringBuilder and ToArray here is what's being buffered - in StringBuilder it's the entire contents of the string you've built up so far. With ToArray it's just references. In other words, resizing the internal buffer used for ToArray is likely to be cheaper than resizing the one for StringBuilder, particularly if the individual strings are long.
After doing the buffering in ToArray, string.Join is hugely efficient: it can look through all the strings to start with, work out exactly how much space to allocate, and then concatenate it without ever having to copy the actual character data.
This is in sharp contrast to a previous answer I've given - but unfortunately I don't think I ever wrote up the benchmark.
I certainly wouldn't expect ToArray to be significantly slower, and I think it makes the code simpler here - no need to use side-effects etc, aggregation etc.

I don't have experience with it myself, but it strikes me that LINQ to XML could vastly simplify your code. Do a select of XML document, then loop through it and use a StringBuilder to append the L element to some string.

The other option is to use Aggregate()
var q = topelement.Elements("L")
.Select(x => x.Value)
.Aggregate(new StringBuilder(),
(sb, x) => return sb.Append(x).Append(" "),
sb => sb.ToString().Trim());
edit: The first lambda in Aggregate is the accumulator. This is taking all of your values and creating one value from them. In this case, it is creating a StringBuilder with your desired text. The second lambda is the result selector. This allows you to translate your accumulated value into the result you want. In this case, changing the StringBuilder to a String.

I like LINQ as much as the next guy, but you're reinventing the wheel here. The XmlElement.InnerText property does exactly what's being asked for.
Try this:
using System.Xml;
class Program
{
static void Main(string[] args)
{
XmlDocument d = new XmlDocument();
string xml =
#"<NO>
<L>Some text here </L>
<L>Some additional text here </L>
<L>Still more text here </L>
</NO>";
d.LoadXml(xml);
Console.WriteLine(d.DocumentElement.InnerText);
Console.ReadLine();
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String concatenation optimisation - c#

Your Aggregate code is basically string concatenation in a loop. Don't do that. Options: Use StringBuilder Use string.Join

Not sure why you're doing list.Cast when a plain IEnumerable will be of objects anyway. But your whole ListToDBList can be replaced by string.Format("({0})", string.Join(",",list.ToArray())); Not sure how much quicker it would be, but it's clearer to my mind.

Related

How to handle null strings?

Linq-to-SQL: Combining (OR'ing) multiple "Contains" filters?

How does C# lambda work?

Is converting a NameValueCollection to a querystring using a c# lambda efficient?

Concatenate collection of XML tags to string with LINQ

Categories

Resources