In C#, best way to check if stringbuilder contains a substring - c#

I have an existing StringBuilder object, the code appends some values and a delimiter to it.
I want to modify the code to add the logic that before appending the text, it will check if it already exists in the StringBuilder. If it does not, only then will it append the text, otherwise it is ignored.
What is the best way to do so? Do I need to change the object to string type? I need the best approach that will not hamper performance.
public static string BuildUniqueIDList(context RequestContext)
{
string rtnvalue = string.Empty;
try
{
StringBuilder strUIDList = new StringBuilder(100);
for (int iCntr = 0; iCntr < RequestContext.accounts.Length; iCntr++)
{
if (iCntr > 0)
{
strUIDList.Append(",");
}
// need to do somthing like:
// strUIDList.Contains(RequestContext.accounts[iCntr].uniqueid) then continue
// otherwise append
strUIDList.Append(RequestContext.accounts[iCntr].uniqueid);
}
rtnvalue = strUIDList.ToString();
}
catch (Exception e)
{
throw;
}
return rtnvalue;
}
I am not sure if having something like this will be efficient:
if (!strUIDList.ToString().Contains(RequestContext.accounts[iCntr].uniqueid.ToString()))

Personally I would use:
return string.Join(",", RequestContext.accounts
.Select(x => x.uniqueid)
.Distinct());
No need to loop explicitly, manually use a StringBuilder etc... just express it all declaratively :)
(You'd need to call ToArray() at the end if you're not using .NET 4, which would obviously reduce the efficiency somewhat... but I doubt it'll become a bottleneck for your app.)
EDIT: Okay, for a non-LINQ solution... if the size is reasonably small I'd just for for:
// First create a list of unique elements
List<string> ids = new List<string>();
foreach (var account in RequestContext.accounts)
{
string id = account.uniqueid;
if (ids.Contains(id))
{
ids.Add(id);
}
}
// Then convert it into a string.
// You could use string.Join(",", ids.ToArray()) here instead.
StringBuilder builder = new StringBuilder();
foreach (string id in ids)
{
builder.Append(id);
builder.Append(",");
}
if (builder.Length > 0)
{
builder.Length--; // Chop off the trailing comma
}
return builder.ToString();
If you could have a large collection of strings, you might use Dictionary<string, string> as a sort of fake HashSet<string>.

Related

C# check conversion from List<string> to single string using String.Join, is possible or not?

I have one List<string> which length is undefined, and for some purpose I'm converting entire List<string> to string, so I want's to check before conversion that it is possible or not(is it gonna throw out of memory exception?) so I can process that much data and continue in another batch.
Sample
int drc = ImportConfiguration.Data.Count;
List<string> queries = new List<string>() { };
//iterate over data row to generate query and execute it
for (int drn = 0; drn < drc; drn++)//drn stands to Data Row Number
{
queries.Add(Generate(ImportConfiguration.Data[drn], drn));
//SO HERE I WANT"S TO CHECK FOR SIZE
//IF IT"S NOT POSSIBLE IN NEXT ITERATION THAN I'LL EXECUTE IT RIGHT NOW
//AND EMPTIED LIST AGAIN FOR NEXT BATCH
if (drn == drc - 1 || drn % 5000 == 0)
{
SqlHelper.ExecuteNonQuery(connection, System.Data.CommandType.Text, String.Join(Environment.NewLine, queries));
queries = new List<string>() { };
}
}
Since you are trying to send a large amount of text to a SQL Server instance, you could use SQL Server's streaming support to write the string to the stream as you go, minimizing the amount of memory needed to construct the data to send.
I can't say it is not possible but I think a better way would be to do the join and catch any exceptions:
try
{
var joined = string.Join(",", list);
}
catch(OutOfMemoryException)
{
// join failed, take action (log, notify user, etc.)
}
Note: if the exception is happening, then you need to consider a different approach than using a list and joining.
You could try:
List<string> theList;
try {
String allString = String.Join(",", theList.ToArray());
} catch (OutOfMemoryException e) {
// ... handle OutOfMemoryException exception (e)
}
EDIT
Based on your comment.
You could give an estimation in the following way.
Get available memory: Take a look at this post
Get sum size of your list strings theList.Sum(s => s.Length);
List<string> theList = new List<string>{ "AAA", "BBB" };
// number of characters
var allSize = theList.Sum(s => s.Length);
// available memory
Process proc = Process.GetCurrentProcess();
var availableMemory = proc.PrivateMemorySize64;;
if (availableMemory > allSize) {
// you can try
try {
String allString = String.Join(",", theList.ToArray());
} catch (OutOfMemoryException e) {
// ... handle OutOfMemoryException exception (e)
}
} else {
// it is not going to work...
}

Check whether a string is in a list at any order in C#

If We have a list of strings like the following code:
List<string> XAll = new List<string>();
XAll.Add("#10#20");
XAll.Add("#20#30#40");
string S = "#30#20";//<- this is same as #20#30 also same as "#20#30#40" means S is exist in that list
//check un-ordered string S= #30#20
// if it is contained at any order like #30#20 or even #20#30 ..... then return true :it is exist
if (XAll.Contains(S))
{
Console.WriteLine("Your String is exist");
}
I would prefer to use Linq to check that S in this regard is exist, no matter how the order is in the list, but it contains both (#30) and (#20) [at least] together in that list XAll.
I am using
var c = item2.Intersect(item1);
if (c.Count() == item1.Length)
{
return true;
}
You should represent your data in a more meaningful way. Don't rely on strings.
For example I would suggest creating a type to represent a set of these numbers and write some code to populate it.
But there are already set types such as HashSet which is possibly a good match with built in functions for testing for sub sets.
This should get you started:
var input = "#20#30#40";
var hashSetOfNumbers = new HashSet<int>(input
.Split(new []{'#'}, StringSplitOptions.RemoveEmptyEntries)
.Select(s=>int.Parse(s)));
This works for me:
Func<string, string[]> split =
x => x.Split(new [] { '#' }, StringSplitOptions.RemoveEmptyEntries);
if (XAll.Any(x => split(x).Intersect(split(S)).Count() == split(S).Count()))
{
Console.WriteLine("Your String is exist");
}
Now, depending on you you want to handle duplicates, this might even be a better solution:
Func<string, HashSet<string>> split =
x => new HashSet<string>(x.Split(
new [] { '#' },
StringSplitOptions.RemoveEmptyEntries));
if (XAll.Any(x => split(S).IsSubsetOf(split(x))))
{
Console.WriteLine("Your String is exist");
}
This second approach uses pure set theory so it strips duplicates.

Alternative to if, else if

I have a lot of if, else if statements and I know there has to be a better way to do this but even after searching stackoverflow I'm unsure of how to do so in my particular case.
I am parsing text files (bills) and assigning the name of the service provider to a variable (txtvar.Provider) based on if certain strings appear on the bill.
This is a small sample of what I'm doing (don't laugh, I know it's messy). All in all, There are approximately 300 if, else if's.
if (txtvar.BillText.IndexOf("SWGAS.COM") > -1)
{
txtvar.Provider = "Southwest Gas";
}
else if (txtvar.BillText.IndexOf("georgiapower.com") > -1)
{
txtvar.Provider = "Georgia Power";
}
else if (txtvar.BillText.IndexOf("City of Austin") > -1)
{
txtvar.Provider = "City of Austin";
}
// And so forth for many different strings
I would like to use something like a switch statement to be more efficient and readable but I'm unsure of how I would compare the BillText. I'm looking for something like this but can't figure out how to make it work.
switch (txtvar.BillText)
{
case txtvar.BillText.IndexOf("Southwest Gas") > -1:
txtvar.Provider = "Southwest Gas";
break;
case txtvar.BillText.IndexOf("TexasGas.com") > -1:
txtvar.Provider = "Texas Gas";
break;
case txtvar.BillText.IndexOf("Southern") > -1:
txtvar.Provider = "Southern Power & Gas";
break;
}
I'm definitely open to ideas.
I would need the ability to determine the order in which the values were evaluated.
As you can imagine, when parsing for hundreds of slightly different layouts I occasionally run into the issue of not having a distinctly unique indicator as to what service provider the bill belongs to.
Why not use everything C# has to offer? The following use of anonymous types, collection initializers, implicitly typed variables, and lambda-syntax LINQ is compact, intuitive, and maintains your modified requirement that patterns be evaluated in order:
var providerMap = new[] {
new { Pattern = "SWGAS.COM" , Name = "Southwest Gas" },
new { Pattern = "georgiapower.com", Name = "Georgia Power" },
// More specific first
new { Pattern = "City of Austin" , Name = "City of Austin" },
// Then more general
new { Pattern = "Austin" , Name = "Austin Electric Company" }
// And for everything else:
new { Pattern = String.Empty , Name = "Unknown" }
};
txtVar.Provider = providerMap.First(p => txtVar.BillText.IndexOf(p.Pattern) > -1).Name;
More likely, the pairs of patterns would come from a configurable source, such as:
var providerMap =
System.IO.File.ReadLines(#"C:\some\folder\providers.psv")
.Select(line => line.Split('|'))
.Select(parts => new { Pattern = parts[0], Name = parts[1] }).ToList();
Finally, as #millimoose points out, anonymous types are less useful when passed between methods. In that case we can define a trival Provider class and use object initializers for nearly identical syntax:
class Provider {
public string Pattern { get; set; }
public string Name { get; set; }
}
var providerMap =
System.IO.File.ReadLines(#"C:\some\folder\providers.psv")
.Select(line => line.Split('|'))
.Select(parts => new Provider() { Pattern = parts[0], Name = parts[1] }).ToList();
Since you seem to need to search for the key before returning the value a Dictionary is the right way to go, but you will need to loop over it.
// dictionary to hold mappings
Dictionary<string, string> mapping = new Dictionary<string, string>();
// add your mappings here
// loop over the keys
foreach (KeyValuePair<string, string> item in mapping)
{
// return value if key found
if(txtvar.BillText.IndexOf(item.Key) > -1) {
return item.Value;
}
}
EDIT: If you wish to have control over the order in which elemnts are evaluated, use an OrderedDictionary and add the elements in the order in which you want them evaluated.
One more using LINQ and Dictionary
var mapping = new Dictionary<string, string>()
{
{ "SWGAS.COM", "Southwest Gas" },
{ "georgiapower.com", "Georgia Power" }
.
.
};
return mapping.Where(pair => txtvar.BillText.IndexOf(pair.Key) > -1)
.Select(pair => pair.Value)
.FirstOrDefault();
If we prefer empty string instead of null when no key matches we can use the ?? operator:
return mapping.Where(pair => txtvar.BillText.IndexOf(pair.Key) > -1)
.Select(pair => pair.Value)
.FirstOrDefault() ?? "";
If we should consider the dictionary contains similar strings we add an order by, alphabetically, shortest key will be first, this will pick 'SCE' before 'SCEC'
return mapping.Where(pair => txtvar.BillText.IndexOf(pair.Key) > -1)
.OrderBy(pair => pair.Key)
.Select(pair => pair.Value)
.FirstOrDefault() ?? "";
To avoid the blatant Schlemiel the Painter's approach that looping over all the keys would involve: let's use regular expressions!
// a dictionary that holds which bill text keyword maps to which provider
static Dictionary<string, string> BillTextToProvider = new Dictionary<string, string> {
{"SWGAS.COM", "Southwest Gas"},
{"georgiapower.com", "Georgia Power"}
// ...
};
// a regex that will match any of the keys of this dictionary
// i.e. any of the bill text keywords
static Regex BillTextRegex = new Regex(
string.Join("|", // to alternate between the keywords
from key in BillTextToProvider.Keys // grab the keywords
select Regex.Escape(key))); // escape any special characters in them
/// If any of the bill text keywords is found, return the corresponding provider.
/// Otherwise, return null.
string GetProvider(string billText)
{
var match = BillTextRegex.Match(billText);
if (match.Success)
// the Value of the match will be the found substring
return BillTextToProvider[match.Value];
else return null;
}
// Your original code now reduces to:
var provider = GetProvider(txtvar.BillText);
// the if is be unnecessary if txtvar.Provider should be null in case it can't be
// determined
if (provider != null)
txtvar.Provider = provider;
Making this case-insensitive is a trivial exercise for the reader.
All that said, this does not even pretend to impose an order on which keywords to look for first - it will find the match that's located earliest in the string. (And then the one that occurs first in the RE.) You do however mention that you're searching through largeish texts; if .NET's RE implementation is at all good this should perform considerably better than 200 naive string searches. (By only making one pass through the string, and maybe a little by merging common prefixes in the compiled RE.)
If ordering is important to you, you might want to consider looking for an implementation of a better string search algorithm than .NET uses. (Like a variant of Boyer-Moore.)
What you want is a Dictionary:
Dictionary<string, string> mapping = new Dictionary<string, string>();
mapping["SWGAS.COM"] = "Southwest Gas";
mapping["foo"] = "bar";
... as many as you need, maybe read from a file ...
Then just:
return mapping[inputString];
Done.
One way of doing it (other answers show very valid options):
void Main()
{
string input = "georgiapower.com";
string output = null;
// an array of string arrays...an array of Tuples would also work,
// or a List<T> with any two-member type, etc.
var search = new []{
new []{ "SWGAS.COM", "Southwest Gas"},
new []{ "georgiapower.com", "Georgia Power"},
new []{ "City of Austin", "City of Austin"}
};
for( int i = 0; i < search.Length; i++ ){
// more complex search logic could go here (e.g. a regex)
if( input.IndexOf( search[i][0] ) > -1 ){
output = search[i][1];
break;
}
}
// (optional) check that a valid result was found.
if( output == null ){
throw new InvalidOperationException( "A match was not found." );
}
// Assign the result, output it, etc.
Console.WriteLine( output );
}
The main thing to take out of this exercise is that creating a giant switch or if/else structure is not the best way to do it.
There are several approaches to do this, but for the reason of simplicity, conditional operator may be a choice:
Func<String, bool> contains=x => {
return txtvar.BillText.IndexOf(x)>-1;
};
txtvar.Provider=
contains("SWGAS.COM")?"Southwest Gas":
contains("georgiapower.com")?"Georgia Power":
contains("City of Austin")?"City of Austin":
// more statements go here
// if none of these matched, txtvar.Provider is assigned to itself
txtvar.Provider;
Note the result is according to the more preceded condition which is met, so if txtvar.BillText="City of Austin georgiapower.com"; then the result would be "Georgia Power".
you can use dictionary.
Dictionary<string, string> textValue = new Dictionary<string, string>();
foreach (KeyValuePair<string, string> textKey in textValue)
{
if(txtvar.BillText.IndexOf(textKey.Key) > -1)
return textKey.Value;
}

C#: Insert strings to another string - performance issue

I have a string, which is long, and a sorted dictionary of indexes and values. I should go over the elements in the dictionary and insert the value to the specified index in the string. I wrote the following code, which works fine, but very slow:
private string restoreText(string text){
StringBuilder sb = new StringBuilder(text);
foreach(KeyValuePair<int, string> pair in _tags){
sb.Insert(pair.Key, pair.Value);
}
return sb.ToString();
}
The dictionary might be very big and contain 500,000 elements.
I think that what makes this function slow is the Insert() method. For dictionary of 100,000 elements, it took almost 5 seconds.
Is there a more efficient way to write this method?
Thanks,
Maya
Better way would be to sort items for insertion and then append them one after another.
Since you didn't comment on the overlap, maybe you have your items sorted in the first place?
Your original code will give different results depending on the order that items are returned from _tags; I very much suspect this isn't your intent.
Instead, sort the tags into order and then add them into the string builder in correct sequence:
private string restoreText(string text)
{
StringBuilder sb = new StringBuilder();
foreach( KeyValuePair<int, string> pair in _tags.OrderBy(t => t.Key))
{
sb.Append(pair.Value);
}
return sb.ToString();
}
If you really want to make this go as fast as possible, initialise the capacity of the StringBuilder up front:
StringBuilder sb = new StringBuilder(_tags.Sum(k => k.Value.Length));
Update
I missed the text parameter originally used to initialise the StringBuilder.
In order to avoid shuffling text around in memory (as caused by StringBuilder.Insert()), we want to stick with using StringBuilder.Append().
We can do this by converting the original text into another sequence of KeyValuePair instances, merging those with the original list and processing in order.
It would look something like this (note: adhoc code):
private string restoreText(string text)
{
var textPairs
= text.Select( (c,i) => new KeyValuePair<int,string>(i, (string)c));
var fullSequence
= textPairs.Union(_tags).OrderBy(t => t.Key);
StringBuilder sb = new StringBuilder();
foreach( KeyValuePair<int, string> pair in fullSequence)
{
sb.Append(pair.Value);
}
return sb.ToString();
}
Note - I've made a whole heap of assumptions about your context, so this may not work quite right for you. Particularly be aware, that .Union() will discard duplicates, though there are easy workarounds for that.
what I don't get if you have your indices setup so that the insert won't change the others but as your code says "yes" I'll assume so too.
Can you test this one:
private string RestoreText(string text)
{
var sb = new StringBuilder();
var totalLen = 0;
var orgIndex = 0;
foreach (var pair in _tags.OrderBy(t => t.Key))
{
var toAdd = text.Substring(orgIndex, pair.Key - totalLen);
sb.Append(toAdd);
orgIndex += toAdd.Length;
totalLen += toAdd.Length;
sb.Append(pair.Value);
totalLen += pair.Value.Length;
}
if (orgIndex < text.Length) sb.Append(text.Substring(orgIndex));
return sb.ToString();
}
it only uses append while beeing the same as your original code
I donnt know how about your data.
but in my test , it run fast(564ms) .
Dictionary<int, string> _tags = new Dictionary<int, string>();
for (int i = 0; i < 1000000; i++)
{
_tags.Add(i, i.ToString().Length + "");
}
string text = new String('a' , 50000000);
Console.WriteLine("****************************************");
System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
StringBuilder sb = new StringBuilder(text);
foreach (KeyValuePair<int, string> pair in _tags)
{
sb.Insert(pair.Key, pair.Value);
}
sw.Stop();
Console.WriteLine("sw:" + sw.ElapsedMilliseconds);
Console.ReadKey();
if you can use append() instead of insert() , it only takes 35ms...

Method to check array list containing specific string

I have an ArrayList that import records from a database.
Is there any method to check whether the arrayList contains schname that i want to match to another list which is an api?
List<PrimaryClass> primaryList = new List<PrimaryClass>(e.Result);
PrimaryClass sc = new PrimaryClass();
foreach (string item in str)
{
for (int a = 0; a <= e.Result.Count - 1; a++)
{
string schname = e.Result.ElementAt(a).PrimarySchool;
string tophonour = e.Result.ElementAt(a).TopHonour;
string cca = e.Result.ElementAt(a).Cca;
string topstudent = e.Result.ElementAt(a).TopStudent;
string topaggregate = e.Result.ElementAt(a).TopAggregate;
string topimage = e.Result.ElementAt(a).TopImage;
if (item.Contains(schname))
{
}
}
}
This is what I have come up with so far, kindly correct any errors that I might have committed. Thanks.
How about ArrayList.Contains?
Try this
foreach( string row in arrayList){
if(row.contains(searchString)){
//put your code here.
}
}
Okay, now you've shown that it's actually a List<T>, it should be easy with LINQ:
if (primaryList.Any(x => item.Contains(x.PrimarySchool))
Note that you should really consider using foreach instead of a for loop to iterate over a list, unless you definitely need the index... and if you're dealing with a list, using the indexer is simpler than calling ElementAt.
// check all types
var containsAnyMatch = arrayList.Cast<object>().Any(arg => arg.ToString() == searchText);
// check strings only
var containsStringMatch = arrayList.OfType<string>().Any(arg => arg == searchText);

Categories