How to generate a unique string from a string collection? - c#

I need a way to convert a strings collection into a unique string. This means that I need to have a different string if any of the strings inside the collection has changed.
I'm working on a big solution so I may wont be able to work with some better ideas. The required unique string will be used to compare the 2 collections, so different strings means different collections. I cannot compare the strings inside one by one because the order may change plus the solution is already built to return result based on 2 strings comparison. This is an add-on. The generated string will be passed as parameter for this comparison.
Thank you!

These both work by deciding to use the separator character of ":" and also using an escape character to make it clear when we mean something else by the separator character. We therefore just need to escape all our strings before concatenating them with our separator in between. This gives us unique strings for every collection. All we need to do if we want to make collections the same regardless or order is to sort our collection before we do anything. I should add that my sample uses LINQ and thus assumes the collection implements IEnumerable<string> and that you have a using declaration for System.LINQ
You can wrap that up in a function as follows
string GetUniqueString(IEnumerable<string> Collection, bool OrderMatters = true, string Escape = "/", string Separator = ":")
{
if(Escape == Separator)
throw new Exception("Escape character should never equal separator character because it fails in the case of empty strings");
if(!OrderMatters)
Collection = Collection.OrderBy(v=>v);//Sorting fixes ordering issues.
return Collection
.Select(v=>v.Replace(Escape, Escape + Escape).Replace(Separator,Escape + Separator))//Escape String
.Aggregate((a,b)=>a+Separator+b);
}

What about using a hash function?

Considering you constraints, use a delimited approach:
pick a delimiter and an escape method.
e.g. use ; and escape it bwithin strings y \;, also escape \ by \\
So this list of strings...
"A;bc"
"D\ef;"
...becomes "A\;bc;D\\ef\;"
It ain't pretty, but considering that it has to be a string, then the good old ways of csv and its brethren isn't all too bad.

By a "collection string" you mean "collection of strings"?
Here's a naive (but working) approach: sort the collection (to eliminate dependency on order), concat them, and take a hash of that (MD5 for instance).
Trivial to implement, but not very clever performance-wise.

Are you saying that you need to encode a string collection as a string. So for example the collection {"abc", "def"} may be encoded as "sDFSDFSDFSD" but {"a", "b"} might be encoded as "SDFeg". If so and you don't care about unique keys then you could use something like SHA or MD5.

Related

Splitting large string in c# and adding values in it to a List

I have a string as shown below
string names = "<?startname; Max?><?startname; Alex?><?startname; Rudy?>";
is there any way I can split this string and add Max , Alex and Rudy into a separate list ?
Sure, split on two strings (all that consistently comes before, and all that consistently comes after) and specify that you want Split to remove the empties:
var r = names.Split(new[]{ "<?startname; ", "?>" }, StringSplitOptions.RemoveEmptyEntries);
If you take out the RemoveEmptyEntries it will give you a more clear idea of how the splitting is working, but in essence without it you'd get your names interspersed with array entries that are empty strings because split found a delimiter (the <?...) immediately following another (the ?>) with an empty string between the delimiters
You can read the volumes of info about this form of split here - that's a direct link to netcore3.1, you can change your version in the table of contents - this variant of Split has been available since framework2.0
You did also say "add to a separate list" - didn't see any code for that so I guess you will either be happy to proceed with r here being "a separate list" (an array actually, but probably adequately equivalent and easy to convert with LINQ's ToList() if not) or if you have another list of names (that really is a List<string>) then you can thatList.AddRange(r) it
Another Idea is to use Regex
The following regex should work :
(?<=; )(.*?)(?=\s*\?>)

c# Remove elements from List containing string [duplicate]

What would be the fastest way to check if a string contains any matches in a string array in C#? I can do it using a loop, but I think that would be too slow.
Using LINQ:
return array.Any(s => s.Equals(myString))
Granted, you might want to take culture and case into account, but that's the general idea.
Also, if equality is not what you meant by "matches", you can always you the function you need to use for "match".
I really couldn't tell you if this is absolutely the fastest way, but one of the ways I have commonly done this is:
This will check if the string contains any of the strings from the array:
string[] myStrings = { "a", "b", "c" };
string checkThis = "abc";
if (myStrings.Any(checkThis.Contains))
{
MessageBox.Show("checkThis contains a string from string array myStrings.");
}
To check if the string contains all the strings (elements) of the array, simply change myStrings.Any in the if statement to myStrings.All.
I don't know what kind of application this is, but I often need to use:
if (myStrings.Any(checkThis.ToLowerInvariant().Contains))
So if you are checking to see user input, it won't matter, whether the user enters the string in CAPITAL letters, this could easily be reversed using ToLowerInvariant().
Hope this helped!
That works fine for me:
string[] characters = new string[] { ".", ",", "'" };
bool contains = characters.Any(c => word.Contains(c));
You could combine the strings with regex or statements, and then "do it in one pass," but technically the regex would still performing a loop internally. Ultimately, looping is necessary.
If the "array" will never change (or change only infrequently), and you'll have many input strings that you're testing against it, then you could build a HashSet<string> from the array. HashSet<T>.Contains is an O(1) operation, as opposed to a loop which is O(N).
But it would take some (small) amount of time to build the HashSet. If the array will change frequently, then a loop is the only realistic way to do it.

Search for an exact phrase and change

I have a problem with changing values ​​into a string.
I have three values, e.g. a, aa, aaa
StringBuilder builders = new StringBuilder(string);
builders.Replace("a", "ab");
builders.Replace("aa", "bab");
builders.Replace("aaa", "bba");
string new_string = builders.ToString();
How to do it to find exactly the value of 'aaa' because for this value both the condition of 'a' and 'aa' is also fulfilled.
Just do the replaces from the longest substring to the shortest one:
builders.Replace("aaa", "bba");
builders.Replace("aa", "bab");
builders.Replace("a", "ab");
The easiest way is to replace the long tokens first.
It looks like your replacement strings also contain "a". So you might need to replace them with a temporary string (one that is unlikely to occur in the original text), and then convert to the target string after you've processed them all.
The only way to intelligently only replace "a" without replacing "aa" is to parse each character in the text to determine how long each token is. It's not that hard, but a lot more work than calling Replace().

Most efficient way of adding/removing a character to beginning of string?

I was doing a small 'scalable' C# MVC project, with quite a bit of read/write to a database.
From this, I would need to add/remove the first letter of the input string.
'Removing' the first character is quite easy (using a Substring method) - using something like:
String test = "HHello world";
test = test.Substring(1,test.Length-1);
'Adding' a character efficiently seems to be messy/awkward:
String test = "ello World";
test = "H" + test;
Seeing as this will be done for a lot of records, would this be be the most efficient way of doing these operations?
I am also testing if a string starts with the letter 'T' by using, and adding 'T' if it doesn't by:
String test = "Hello World";
if(test[0]!='T')
{
test = "T" + test;
}
and would like to know if this would be suitable for this
If you have several records and to each of the several records field you need to append a character at the beginning, you can use String.Insert with an index of 0 http://msdn.microsoft.com/it-it/library/system.string.insert(v=vs.110).aspx
string yourString = yourString.Insert( 0, "C" );
This will pretty much do the same of what you wrote in your original post, but since it seems you prefer to use a Method and not an operator...
If you have to append a character several times, to a single string, then you're better using a StringBuilder http://msdn.microsoft.com/it-it/library/system.text.stringbuilder(v=vs.110).aspx
Both are equally efficient I think since both require a new string to be initialized, since string is immutable.
When doing this on the same string multiple times, a StringBuilder might come in handy when adding. That will increase performance over adding.
You could also opt to move this operation to the database side if possible. That might increase performance too.
For removing I would use the remove command as this doesn't require to know the length of the string:
test = test.Remove(0, 1);
You could also treat the string as an array for the Add and use
test = test.Insert(0, "H");
If you are always removing and then adding a character you can treat the string as an array again and just replace the character.
test = (test.ToCharArray()[0] = 'H').ToString();
When doing lots of operations to the same string I would use a StringBuilder though, more expensive to create but faster operations on the string.

How do I prevent a string from appearing in a result string when a set of child strings are concatenated to form the result string?

I have 5 strings, let's call them
EarthString
FireString
WindString
WaterString
HeartString
All of them can have varying length, any of them can be empty, or can be very long (but never null).
These 5 strings are very good friends, and every weekend they are concatenated to form a result string using this c# statement
ResultString = EarthString + FireString + WindString + WaterString + HeartString
Depending on the values of these strings, sometimes (only sometimes), ResultString will contain "Captain Planet" as a substring.
My question is, how do I manipulate each of the 5 strings before they are concatenated, so that when they are combined, "Captain Planet" will never appear as a substring in the resultant string?
The only way I can think of right now is to examine each character in each string, in sequential order, but that seems very tedious. Since each of the 5 good friends strings can be of any length, examining the characters individually will also require some kind of concatenation before we can determine whether any character need to be dropped.
Edit: The resultant string is a filtered version of the 5 strings concatenated together, all the other content remain the same except the "Captain Planet" string is dropped. Yes, i'm looking for a solution which allows the 5 strings to be manipulated before concatenation. (this is actually a simplification of a bigger programming problem i'm encountering). Thanks guys.
If you want to do it pre-concat you could
Assign the start and end of each string a numeric value based on the portion of "CaptainPlanet" they contein. Ex: if Air = "net the big captain" then it would get 3 for a start value and 7 for an end value. to determine if you could concat 2 values safely you would just check to see if the end of the left string + start of the right string were not equal to the total length of "CaptainPlanet". If you had very large strings this would allow you to inspect just the first x and last x characters of the string to compute the start/end value.
This solution doesn't account for short strings like ei air = "Cap" , earth ="tain" and fire="Planet". In that case you would need to have a special case for tokens that are shorter than the length of "CaptainPlanet" For those.
Is there a particular reason you can't just do this?
ResultString.Replace("CaptainPlanet", "x");
If it doesn't matter how many chars will be dropped, you can remove f.e. all 'C' in all strings.
The original answer cleared all of the strings, but as pointed out by J.Steen, there was already a formulation of the expected output. So there we go.
Run elementString.Replace("Captain Planet", "") on every substring.
Now you have to identify all the prefixes / suffixes of "Captain Planet" on each of the substrings, and keep that information so that it can be processed before contatenation. That is, e.g. if the substring ends with "Capt", then you should have an information that "substring contains at the end a prefix of the 4 first letters of 'Captain Planet'". You also have to consider the cases of complete substrings (e.g. one of the strings is "ptain Pla"). The problem also becomes more complex if any of the e.g. prefixes can be recursive or repeated (e.g. "CaptainCap" contains 2 kinds of valid prefixes for "CaptainCaptain", and "apt" can be found at two locations in the resulting string);
You process that information before concatenation so that the result string has the same thing as ResultString.Replace("Captain Planet", ""). Congratulations, you have made your program much more complex than necessary!
But in short, you cannot get both the result that you want (all of the substrings intact except for the combined result output) and do the processing wholly before the concatenation step.

Categories