I'm currently using C#, but I believe the question applies to more languages.
I have a method, which takes a string value, and throws an exception, if it's too big. I want to unit test it that the exception is correct.
int vlen = Encoding.UTF8.GetByteCount(value);
if (vlen < 0 || 0x0FFFFFFF < vlen)
throw new ArgumentException("Valid UTF8 encodded value length is up to 256MB!", "value");
What is the best way to generate such a string? Should I just have a file of that size? Should I create such a file every time running unit tests?
string has a constructor that lest you specify a length and a characer to repeat:
string longString = new string('a',0x0FFFFFFF + 1);
You can simply use a StringBuilder:
StringBuilder builder = new StringBuilder();
builder.Append('a', 0x10000000);
string s = builder.ToString();
//Console.WriteLine(s.Length);
YourMethodToTest(s);
This takes no measurable time at my machine and I'm sure there won't be a serious performance issue on your machine either.
What I would usually use in Unit Tests is a package called AutoFixture.
With that you can do the following to generate a large string:
string.Join(string.Empty, Fixture.CreateMany<char>(length))
Related
Looking for better algorithm / technique for replacing strings in a string variable. I have to loop through an unknown number of database records and for each one, I need to replace some text in a string variable. Right now it looks like this, but there has to be a better way:
using (eds ctx = new eds())
{
string targetText = "This is a sample string with words that will get replaced based on data pulled from the database";
List<parameter> lstParameters = ctx.ciParameters.ToList();
foreach (parameter in lstParameters)
{
string searchKey = parameter.searchKey;
string newValue = parameter.value;
targetText = targetText.Replace(searchKey, newValue);
}
}
From my understanding this is not good because I'm over writing the targetText variable, over and over in the loop. However, I'm not sure how structure the find and replace...
Appreciate any feedback.
there has to be a better way
Strings are immutable - you can't "change" them - all you can do is create a new string and replace the variable value (which is not as bad as you think). You could try using a StringBuilder as other suggest, but it's not 100% guaranteed to improve your performance.
You could change your algorithm to loop through the "words" in targetText, see if there's a match in parameters , take the "replacement" value and build up a new string, but I suspect the extra lookups will cost more than recreating the string value multiple times.
In any case, two important principles of performance improvement should be considered:
Start with the slowest part of your app first - you may see some improvement but if it does not improve the overall performance significantly then it doesn't matter that much
The only way to know if a particular change will improve your performance (and by how much) is to try it both ways and measure it.
StringBuilder will have less memory overhead and better performance, especially on large strings. String.Replace() vs. StringBuilder.Replace()
using (eds ctx = new eds())
{
string targetText = "This is a sample string with words that will get replaced based on data pulled from the database";
var builder = new StringBuilder(targetText);
List<parameter> lstParameters = ctx.ciParameters.ToList();
foreach (parameter in lstParameters)
{
string searchKey = parameter.searchKey;
string newValue = parameter.value;
targetText = builder.Replace(searchKey, newValue);
}
}
Actually, there is a better answer, assuming you're doing a large number of replacements. You can use a StringBuilder. As you know, strings are immutable. So as you said, you're creating strings over and over again in your loop.
If you convert your string to a StringBuilder
StringBuilder s = new StringBuilder(s, s.Length*2); // Adjust the capacity based on how much bigger you think the string will get due to replacements. The more accurate your estimate, the better this will perform.
foreach (parameter in lstParameters)
{
s.Replace(parameter.searchKey, parameter.value);
}
string targetString = s.ToString();
Now a caveat, if your list only has 2-3 items in it, this might not be any better. The answer to this question provides a nice analysis of the performance improvement you can expect to see.
I would like to know the different ways of inserting a variable into a string, in C#.
I am currently trying to insert values into a json string that I am building:
Random rnd = new Random();
int ID = rnd.Next(1, 999);
string body = #"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""help here""}";
How could I add the "ID" to the string body?
In a typical string inserting scenario, I'd do one of these:
string body = string.Format("My ID is {0}", ID);
string body = "My ID is " + ID;
However, your string is apparently JSON serialized data. I'd expect that I'd want to parse that into a class in order to work with it.
var myObj = JsonConvert.DeserializeObject<MyClass>(someString);
myObj.TID = ID;
// maybe do other things with it, then if I need JSON again...
string body = JsonConvert.SerializeObject(myObj);
One reason to take this approach is to make sure that any data I put in still makes the JSON valid. For example, if my ID were, instead of an int, a string with characters that needed escaping, directly inserting "\"\n\"" would not be the right thing to do.
String interpolation is the easiest way these days:
int myIntValue = 123;
string myStringValue = "one two three";
string interpolatedString = $"my int is: {myIntValue}. My string is: {myStringValue}.";
Output would be "my int is: 123. My string is: one two three.".
You can experiment with this sample yourself, over here.
The $ special character identifies a string literal as an interpolated
string. An interpolated string is a string literal that might contain
interpolation expressions. When an interpolated string is resolved to
a result string, items with interpolation expressions are replaced by
the string representations of the expression results. This feature is available starting with C# 6.
You could try this:
string body = #"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""" + ID + #"""}";
You can also use string.Concat:
string body = string.Concat(#"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""", ID, #"""}");
There are a number of ways to inject values into strings, however it's easy to lose sight of encodings, and cause major breakage.
If you just want to inject a value into another string, you can use:
string concatenation
string building
string formatting
Concatenation:
The simplest and most common way to build strings is by simply concatenating them together with the + operator:
var foo = 5;
var bar = "example-" + foo;
Concatenation can be difficult to read which makes it easy to introduce bugs, but for most simple tasks is the right tool for the job.
In this case, it's a poor choice:
string body = #"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""" + ID.ToString() + #"""}";
String Building
The StringBuilder class is useful for building large strings particularly when built iteratively.
var sb = new StringBuilder();
for (var i = 0; i < 1000; i++) {
sb.Append(i.ToString());
sb.Append(" ");
}
var output = sb.ToString();
It can still be difficult to read and hard to debug, but for cases where you're joining lots of strings together, it's super efficient
In this case, it's a poor choice:
StringBuilder sb = new StringBuilder();
sb.Append(#"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""");
sb.Append(ID.ToString());
sb.Append(#"""}");
string body = sb.ToString();
String formatting
The string.Format method makes templating data into a string super easy and efficient. If you plan on reusing the same string over and over, using a format string makes it much easier to read and debug code, particularly when there are lots of replacements:
var foo = 5;
var bar = string.Format("example-{0}", foo);
Format strings can also automatically apply culturally accurate formatting to particular data types, so that a DateTime is appropriately displayed, or so that a number has the appropriate number of trailing zeros.
In this case, it's a poor choice:
string string.Format(#"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""{0}""}", ID);
The right choice
You're not dumping data into any old string. That's JSON encoded data. If you just concatenate/build/format in any old value, you can break your string. For example, if the ID variable contained a " character, you'd break the entire JSON dataset.
Additionally, the length of the string and necessary quotes make it super difficult to read, which makes it difficult to maintain. Good luck when you get around to needing to add another formatted value, it's going to be a pain to change any existing value or add in new dynamic ones.
Instead of writing a JSON literal, write an object and encode it to JSON:
var bodyData =
new
{
currency = "country",
gold = 1,
detail = "detailid-979095986",
tId = ID //here's where you set the ID
};
var jss = new JavaScriptSerializer();
var body = jss.Serialize(bodyData);
This code is much easier to modify when the data changes, and will actually encode your data correctly. You don't need to worry about all those annoying double quote characters any more either.
You can use the
String.Format(#"{""currency"":""country"",""gold"":1,""detail"":""detailid-979095986"",""tId"":""{0}""}", ID)
Since this is params object[], you can use as many {n} as you want.
Instead of using on string, you could concatenate strings together using +, which would allow you to insert text between the generated strings.
string body = #"***" + ID + #"***";
This has been asked a few different ways but I am debating on "my way" vs "your way" with another developer. Language is C#.
I want to parse a pipe delimited string where the first 2 characters of each chunk is my tag.
The rules. Not my rules but rules I have been given and must follow.
I can't change the format of the string.
This function will be called possibly many times so efficiency is key.
I need to keep is simple.
The input string and tag I am looking for may/will change during runtime.
Example input string: AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4
Example tag I may need value for: AB
I split string into an array based on delimiter and loop through the array each time the function is called. I then looked at the first 2 characters and return the value minus the first 2 characters.
The "other guys" way is to take the string and use a combination of IndexOf and SubString to find the starting point and ending point of the field I am looking for. Then using SubString again to pullout the value minus the first 2 characters. So he would say IndexOf("|AB") the find then next pipe in the string. This would be the start and end. Then SubString that out.
Now I should think that IndexOf and SubString would parse the string each time at a char by char level so this would be less efficient than using large chunks and reading the string minus the first 2 characters. Or is there another way the is better then what both of us has proposed?
The other guy's approach is going to be more efficient in time given that input string needs to be reevaluated each time. If the input string is long, it is also won't require the extra memory that splitting the string would.
If I'm trying to code a really tight loop I prefer to directly use array/string operators rather than LINQ to avoid that additional overhead:
string inputString = "AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4";
static string FindString(string tag)
{
int startIndex;
if (inputString.StartsWith(tag))
{
startIndex = tag.Length;
}
else
{
startIndex = inputString.IndexOf(string.Format("|{0}", tag));
if (startIndex == -1)
return string.Empty;
startIndex += tag.Length + 1;
}
int endIndex = inputString.IndexOf('|', startIndex);
if (endIndex == -1)
endIndex = inputString.Length;
return inputString.Substring(startIndex, endIndex - startIndex);
}
I've done a lot of parsing in C# and I would probably take the approach suggested by the "other guys" just because it would be a bit lighter on resources used and likely to be a little faster as well.
That said, as long as the data isn't too big, there's nothing wrong with the first approach and it will be much easier to program.
Something like this may work ok
string myString = "AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4";
string selector = "AB";
var results = myString.Split('|').Where(x => x.StartsWith(selector)).Select(x => x.Replace(selector, ""));
Returns: list of the matches, in this case just one "VALUE2"
If you are just looking for the first or only match this will work.
string result = myString.Split('|').Where(x => x.StartsWith(selector)).Select(x => x.Replace(selector, "")).FirstOrDefault();
SubString does not parse the string.
IndexOf does parse the string.
My preference would be the Split method, primarily code coding efficiency:
string[] inputArr = input.Split("|".ToCharArray()).Select(s => s.Substring(3)).ToArray();
is pretty concise. How many LoC does the substring/indexof method take?
Im still learning in C#, and there is one thing i cant really seem to find the answer to.
If i have a string that looks like this "abcdefg012345", and i want to make it look like "ab-cde-fg-012345"
i tought of something like this:
string S1 = "abcdefg012345";
string S2 = S1.Insert(2, "-");
string S3 = S2.Insert(6, "-");
string S4 = S3.Insert.....
...
..
Now i was looking if it would be possible to get this al into 1 line somehow, without having to make all those strings.
I assume this would be possible somehow ?
Whether or not you can make this a one-liner (you can), it will always cause multiple strings to be created, due to the immutability of the String in .NET
If you want to do this somewhat efficiently, without creating multiple strings, you could use a StringBuilder. An extension method could also be useful to make it easier to use.
public static class StringExtensions
{
public static string MultiInsert(this string str, string insertChar, params int[] positions)
{
StringBuilder sb = new StringBuilder(str.Length + (positions.Length*insertChar.Length));
var posLookup = new HashSet<int>(positions);
for(int i=0;i<str.Length;i++)
{
sb.Append(str[i]);
if(posLookup.Contains(i))
sb.Append(insertChar);
}
return sb.ToString();
}
}
Note that this example initialises StringBuilder to the correct length up-front, therefore avoiding the need to grow the StringBuilder.
Usage: "abcdefg012345".MultiInsert("-",2,5); // yields "abc-def-g012345"
Live example: http://rextester.com/EZPQ89741
string S1 = "abcdefg012345".Insert(2, "-").Insert(6, "-")..... ;
If the positions for the inserted strings are constant you could consider using string.Format() method. For example:
string strTarget = String.Format("abc{0}def{0}g012345","-");
string s = "abcdefg012345";
foreach (var index in [2, 6, ...]
{
s = s.Insert(index, "-");
}
I like this
StringBuilder sb = new StringBuilder("abcdefg012345");
sb.Insert(6, '-').Insert(2, '-').ToString();
String s1 = "abcdefg012345";
String seperator = "-";
s1 = s1.Insert(2, seperator).Insert(6, seperator).Insert(9, seperator);
Chaining them like that keeps your line count down. This works because the Insert method returns the string value of s1 with the parameters supplied, then the Insert function is being called on that returned string and so on.
Also it's worth noting that String is a special immutable class so each time you set a value to it, it is being recreated. Also worth noting that String is a special type that allows you to set it to a new instance with calling the constructor on it, the first line above will be under the hood calling the constructor with the text in the speech marks.
Just for the sake of completion and to show the use of the lesser known Aggregate function, here's another one-liner:
string result = new[] { 2, 5, 8, 15 }.Aggregate("abcdefg012345", (s, i) => s.Insert(i, "-"));
result is ab-cd-ef-g01234-5. I wouldn't recommend this variant, though. It's way too hard to grasp on first sight.
Edit: this solution is not valid, anyway, as the "-" will be inserted at the index of the already modified string, not at the positions wrt to the original string. But then again, most of the answers here suffer from the same problem.
You should use a StringBuilder in this case as Strings objects are immutable and your code would essentially create a completely new string for each one of those operations.
http://msdn.microsoft.com/en-us/library/2839d5h5(v=vs.71).aspx
Some more information available here:
http://www.dotnetperls.com/stringbuilder
Example:
namespace ConsoleApplication10
{
class Program
{
static void Main(string[] args)
{
StringBuilder sb = new StringBuilder("abcdefg012345");
sb.Insert(2, '-');
sb.Insert(6, '-');
Console.WriteLine(sb);
Console.Read();
}
}
}
If you really want it on a single line you could simply do something like this:
StringBuilder sb = new StringBuilder("abcdefg012345").Insert(2, '-').Insert(6, '-');
I have the following intentionally trivial function:
void ReplaceSome(ref string text)
{
StringBuilder sb = new StringBuilder(text);
sb[5] = 'a';
text = sb.ToString();
}
It appears to be inefficient to convert this to a StringBuilder to index into and replace some of the characters only to copy it back to the ref'd param. Is it possible to index directly into the text param as an L-Value?
Or how else can I improve this?
C# strings are "immutable," which means that they can't be modified. If you have a string, and you want a similar but different string, you must create a new string. Using a StringBuilder as you do above is probably as easy a method as any.
Armed with Reflector and the decompiled IL - On a pure LOC basis then the StringBuilder approach is definitely the most efficient. Eg tracing the IL calls that StringBuilder makes internally vs the IL calls for String::Remove and String::Insert etc.
I couldn't be bothered testing the memory overhead of each approach, but would imagine it would be in line with reflector results - the StringBuilder approach would be the best.
I think the fact the StringBuilder has a set memory size using the constructor
StringBuilder sb = new StringBuilder(text);
would help overall too.
Like others have mentioned, it would come down to readability vs efficiency...
text = text.Substring(0, 4) + "a" + text.Substring(5);
Not dramatically different than your StringBuilder solution, but slightly more concise than the Remove(), Insert() answer.
I don't know if this is more efficient, but it works. Either way you'll have to recreate the string after each change since they're immutable.
string test = "hello world";
Console.WriteLine(test);
test = test.Remove(5, 1);
test = test.Insert(5, "z");
Console.WriteLine(test);
Or if you want it more concise:
string test = "hello world".Remove(5, 1).Insert(5, "z");