Adding an incremental number to duplicate string - c#

I'm working in c# (.Net4 using Visual Studio) and I'm trying to figure out an algorithm to append incremental numbers to strings entered, based on existing strings in the program. Not doing too well searching around for an answer.
I have a List<string>. An example would be
{"MyItem (2)", "MyItem", "Other thing", "string here", "MyItem (1)"}
So say the user wants to add another string to this list, and they've selected "MyItem" as the string to add. So given the input and the existing list, the algorithm would return "MyItem (3)" as the new string to add.
It's the same function as in Windows Explorer where you keep adding New Folders ("New Folder (1)", "New Folder (2)" and on and on)
I'm trying just looping through the list and figuring out what the next logical number should be but I'm getting stuck (and the code's getting large). Anyone know an elegent way of doing this? (I'm not too good with Regex so maybe that's what I'm missing)

Get the input and search for it, if it's present in the list then get the count and concatenate input string and count + 1 otherwise just add the input to the list:
var input = Console.ReadLine(); // just for example
if(list.Any(x => x == input))
{
var count = list.Count(x => x == input);
list.Add(string.Format("{0} ({1})", input, count+1);
}
else list.Add(input);

This should work:
var list = new List<string>{"MyItem (2)", "MyItem", "Other thing", "string here", "MyItem (1)"} ;
string str = "MyItem";
string newStr = str;
int i = 0;
while(list.Contains(newStr))
{
i++;
newStr = string.Format("{0} ({1})",str,i);
}
// newStr = "MyItem (3)"

The following is a useful extension method that I came up with to simulate the behaviour of Windows Explorer.
The previous answers I feel were too simple and only partially satisfied the requirements, they were also not presented in a way that you could easily reuse them.
This solution is based on you first identifying the list of strings that you want to compare against, they might come from a file system, or database, its up to you to resolve the list of values from your business domain, then the process of identifying the duplicates and generating a unique values is very repeatable.
Extension Method:
/// <summary>
/// Generate a uniquely numbered string to insert into this list
/// Uses convention of appending the value with the duplication index number in brackets "~ (#)"
/// </summary>
/// <remarks>This will not actually add this list</remarks>
/// <param name="input">The string to evaluate against this collection</param>
/// <param name="comparer">[Optional] One of the enumeration values that specifies how the strings will be compared, will default to OrdinalIgnoreCase </param>
/// <returns>A numbered variant of the input string that would be unique in the list of current values</returns>
public static string GetUniqueString(this IList<string> currentValues, string input, StringComparison comparison = StringComparison.OrdinalIgnoreCase)
{
// This matches the pattern we are using, i.e. "A String Value (#)"
var regex = new System.Text.RegularExpressions.Regex(#"\(([0-9]+)\)$");
// this is the comparison value that we want to increment
string prefix = input.Trim();
string result = input.Trim();
// let it through if there is no current match
if (currentValues.Any(x => x.Equals(input, comparison)))
{
// Identify if the input value has already been incremented (makes this more reusable)
var inputMatch = regex.Match(input);
if (inputMatch.Success)
{
// this is the matched value
var number = inputMatch.Groups[1].Captures[0].Value;
// remove the numbering from the alias to create the prefix
prefix = input.Replace(String.Format("({0})", number), "").Trim();
}
// Now evaluate all the existing items that have the same prefix
// NOTE: you can do this as one line in Linq, this is a bit easier to read
// I'm trimming the list for consistency
var potentialDuplicates = currentValues.Select(x => x.Trim()).Where(x => x.StartsWith(prefix, comparison));
int count = 0;
int maxIndex = 0;
foreach (string item in potentialDuplicates)
{
// Get the index from the current item
var indexMatch = regex.Match(item);
if (indexMatch.Success)
{
var index = int.Parse(indexMatch.Groups[1].Captures[0].Value);
var test = item.Replace(String.Format("({0})", index), "").Trim();
if (test.Equals(prefix, comparison))
{
count++;
maxIndex = Math.Max(maxIndex, index);
}
}
}
int nextIndex = Math.Max(maxIndex, count) + 1;
result = string.Format("{0} ({1})", prefix, nextIndex);
}
return result;
}
Implementation:
var list = new string [] { "MyItem (2)", "MyItem", "Other thing", "string here", "MyItem (1)" };
string input = Console.ReadLine(); // simplify testing, thanks #selman-genç
var result = list.GetUniqueString(input, StringComparison.OrdinalIgnoreCase);
// Display the result, you can add it to the list or whatever you need to do
Console.WriteLine(result);
Input | Result
---------------------------------
MyItem | MyItem (3)
myitem (1) | myitem (3)
MyItem (3) | MyItem (3)
MyItem (4) | MyItem (4)
MyItem 4 | MyItem 4
String Here | String Here (1)
a new value | a new value

Pseudo-code:
If the list has no such string, add it to the list.
Otherwise, set variable N = 1.
Scan the list and look for strings like the given string + " (*)" (here Regex would help).
If any string is found, take the number from the braces and compare it against N. Set N = MAX( that number + 1, N ).
After the list has been scanned, N contains the number to add.
So, add the string + " (N)" to the list.

Related

Parsing Line Breaks from Plain Text

I have a process that parses emails. The software that we're using to retrieve and store the contents of the body doesn't seem to include line-breaks, so I end up with something like this -
Good afternoon, [line-break] this is my email. [line-break] Info: data [line-break] More info: data
My [line-break] brackets are where the line breaks should be. However, when we extract the body, we get just the text. It makes it tough to parse the text without having the line breaks.
Essentially, what I need to do is parse each [Info]: [Data]. I can find where the [Info] tags begin, but without having line-breaks, I'm struggling to know where the data associated to that info should end. The email is coming from Windows.
Is there any way to take plain text and encode it to some way that would include line breaks?
Example Email Contents
Good Morning, Order: 1234 The Total: $445 When: 7/10 Type: Dry
Good Morning, Order: 1235 The Total: $1743 Type: Frozen When: 7/22
Order: 1236 The Total: $950.14 Type: DRY When: 7/10
The Total: $514 Order: 1237 Type: Dry CSR: Tim W
Sorry, below is your order: Order: 1236 The Total: $500 When: 7/10 Type: Dry Creator: Josh A. Thank you
Now, I need to loop through the email and parse out the values for Order, Total, and Type. The other placeholder: values are irrelevant and random.
Try something like this.
You need to add all possible sections identifiers: it can be updated over time, to add more known identifiers, to reduce the chance of mistakes in parsing the strings.
As of now, if the value marked by a known identifier contains an unknown identifier when the string is parsed, that part is removed.
If an unknown identifier is encountered, it's ignored.
Regex.Matches will extract all matching parts, return their Value, the Index position and the length, so it's simple to use [Input].SubString(Index, NextPosition - Index) to return the value corresponding to the part requested.
The EmailParser class GetPartValue(string) returns the content of an identifier by its name (the name can include the colon char or not, e.g. "Order" or "Order:").
The Matches properties returns a Dictionary<string, string> of all matched identifiers and their content. The content is cleaned up - as possible - calling CleanUpValue() method.
Adjust this method to deal with some specific/future requirements.
► If you don't pass a Pattern string, a default one is used.
► If you change the Pattern, setting the CurrentPatter property (perhaps using one stored in the app settings or edited in a GUI or whatever else), the Dictionary of matched values is rebuilt.
Initialize with:
string input = "Good Morning, Order: 1234 The Total: $445 Unknown: some value Type: Dry When: 7/10";
var parser = new EmailParser(input);
string value = parser.GetPartValue("The Total");
var values = parser.Matches;
public class EmailParser
{
static string m_Pattern = "Order:|The Total:|Type:|Creator:|When:|CSR:";
public EmailParser(string email) : this(email, null) { }
public EmailParser(string email, string pattern)
{
if (!string.IsNullOrEmpty(pattern)) {
m_Pattern = pattern;
}
Email = email;
this.Matches = GetMatches();
}
public string Email { get; }
public Dictionary<string, string> Matches { get; private set; }
public string CurrentPatter {
get => m_Pattern;
set {
if (value != m_Pattern) {
m_Pattern = value;
this.Matches = GetMatches();
}
}
}
public string GetPartValue(string part)
{
if (part[part.Length - 1] != ':') part += ':';
if (!Matches.Any(m => m.Key.Equals(part))) {
throw new ArgumentException("Part non included");
}
return Matches.FirstOrDefault(m => m.Key.Equals(part)).Value;
}
private Dictionary<string, string> GetMatches()
{
var dict = new Dictionary<string, string>();
var matches = Regex.Matches(Email, m_Pattern, RegexOptions.Singleline);
foreach (Match m in matches) {
int startPosition = m.Index + m.Length;
var next = m.NextMatch();
string parsed = next.Success
? Email.Substring(startPosition, next.Index - startPosition).Trim()
: Email.Substring(startPosition).Trim();
dict.Add(m.Value, CleanUpValue(parsed));
}
return dict;
}
private string CleanUpValue(string value)
{
int pos = value.IndexOf(':');
if (pos < 0) return value;
return value.Substring(0, value.LastIndexOf((char)32, pos));
}
}

Getting NullReferenceException when trying to add a value to an array

I'm having issues with this code.
Everytime when it runs, it returns me the 'System.NullReferenceException'.
// Clear out the Array of code words
wordBuffer = null;
Int32 syntaxCount = 0;
// Create the regular expression object to match against the string
Regex defaultRegex = new Regex(#"\w+|[^A-Za-z0-9_ \f\t\v]",
RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match wordMatch;
// Loop through the string and continue to record
// words and symbols and their corresponding positions and lengths
for (wordMatch = defaultRegex.Match(s); wordMatch.Success; wordMatch = wordMatch.NextMatch())
{
var word = new Object[3] { wordMatch.Value, wordMatch.Index, wordMatch.Length };
wordBuffer[syntaxCount] = word;
Debug.WriteLine("Found = " + word[0]);
syntaxCount++;
}
// return the number of symbols and words
return syntaxCount;
The exception occurs on those lines:
Debug.WriteLine("Found = " + word[0]);
syntaxCount++;
Specifically when trying to get the word[0] value, and on the second line with the syntaxCount, but none of those values are null, as you can see in the image below:
The variable "s" is just a line of a RichEditBox, word[0] has a value, so why is it returning the NullReferenceException? syntaxCount has a value too :/
You are getting the exception on the line wordBuffer[syntaxCount] = word;
You are using the wrong approach for storing the results. Arrays are not created automatically and they do not grow automatically. I.e., you need to define their size in advance with string[] arr = new string[size]. Use a list instead, as you do not know the size in advance here. Lists grow automatically:
// Initialize with
var wordBuffer = new List<string>();
// ...
// And then add a word to the list with
wordBuffer.Add(word);
You query the number of entries with wordBuffer.Count and you can access the items as in an array, once they have been added: wordBuffer[i], where the index goes from 0 to wordBuffer.Count - 1. This makes the variable syntaxCount superfluous.
And of course, you can loop through a list with foreach.

What is the best way to parse this string in C#?

I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:
key:value[space]key:value[space]key:value[space]
So I wrote this code to parse it:
string myString = ReadinString();
string[] tokens = myString.split(' ');
foreach (string token in tokens) {
string key = token.split(':')[0];
string value = token.split(':')[1];
. . . .
}
The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.
NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.
Use this regular expression:
\w+:[\w\s]+(?![\w+:])
I tested it on
test:testvalue test2:test value test3:testvalue3
It returns three matches:
test:testvalue
test2:test value
test3:testvalue3
You can change \w to any character set that can occur in your input.
Code for testing this:
var regex = new Regex(#"\w+:[\w\s]+(?![\w+:])");
var test = "test:testvalue test2:test value test3:testvalue3";
foreach (Match match in regex.Matches(test))
{
var key = match.Value.Split(':')[0];
var value = match.Value.Split(':')[1];
Console.WriteLine("{0}:{1}", key, value);
}
Console.ReadLine();
As Wonko the Sane pointed out, this regular expression will fail on values with :. If you predict such situation, use \w+:[\w: ]+?(?![\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.
This cannot work without changing your split from a space to something else such as a "|".
Consider this:
Alfred Bester:Alfred Bester Alfred:Alfred Bester
Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?
string input = "foo:Foobarius Maximus Tiberius Kirk bar:Barforama zap:Zip Brannigan";
foreach (Match match in Regex.Matches(input, #"(\w+):([^:]+)(?![\w+:])"))
{
Console.WriteLine("{0} = {1}",
match.Groups[1].Value,
match.Groups[2].Value
);
}
Gives you:
foo = Foobarius Maximus Tiberius Kirk
bar = Barforama
zap = Zip Brannigan
You could try to Url encode the content between the space (The keys and the values not the : symbol) but this would require that you have control over the Input Method.
Or you could simply use another format (Like XML or JSON), but again you will need control over the Input Format.
If you can't control the input format you could always use a Regular expression and that searches for single spaces where a word plus : follows.
Update (Thanks Jon Grant)
It appears that you can have spaces in the key and the value. If this is the case you will need to seriously rethink your strategy as even Regex won't help.
string input = "key1:value key2:value key3:value";
Dictionary<string, string> dic = input.Split(' ').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);
The first will produce an array:
"key:value", "key:value"
Then an array of arrays:
{ "key", "value" }, { "key", "value" }
And then a dictionary:
"key" => "value", "key" => "value"
Note, that Dictionary<K,V> doesn't allow duplicated keys, it will raise an exception in such a case. If such a scenario is possible, use ToLookup().
Using a regular expression can solve your problem:
private void DoSplit(string str)
{
str += str.Trim() + " ";
string patterns = #"\w+:([\w+\s*])+[^!\w+:]";
var r = new System.Text.RegularExpressions.Regex(patterns);
var ms = r.Matches(str);
foreach (System.Text.RegularExpressions.Match item in ms)
{
string[] s = item.Value.Split(new char[] { ':' });
//Do something
}
}
This code will do it (given the rules below). It parses the keys and values and returns them in a Dictonary<string, string> data structure. I have added some code at the end that assumes given your example that the last value of the entire string/stream will be appended with a [space]:
private Dictionary<string, string> ParseKeyValues(string input)
{
Dictionary<string, string> items = new Dictionary<string, string>();
string[] parts = input.Split(':');
string key = parts[0];
string value;
int currentIndex = 1;
while (currentIndex < parts.Length-1)
{
int indexOfLastSpace=parts[currentIndex].LastIndexOf(' ');
value = parts[currentIndex].Substring(0, indexOfLastSpace);
items.Add(key, value);
key = parts[currentIndex].Substring(indexOfLastSpace + 1);
currentIndex++;
}
value = parts[parts.Length - 1].Substring(0,parts[parts.Length - 1].Length-1);
items.Add(key, parts[parts.Length-1]);
return items;
}
Note: this algorithm assumes the following rules:
No spaces in the values
No colons in the keys
No colons in the values
Without any Regex nor string concat, and as an enumerable (it supposes keys don't have spaces, but values can):
public static IEnumerable<KeyValuePair<string, string>> Split(string text)
{
if (text == null)
yield break;
int keyStart = 0;
int keyEnd = -1;
int lastSpace = -1;
for(int i = 0; i < text.Length; i++)
{
if (text[i] == ' ')
{
lastSpace = i;
continue;
}
if (text[i] == ':')
{
if (lastSpace >= 0)
{
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1, lastSpace - keyEnd - 1));
keyStart = lastSpace + 1;
}
keyEnd = i;
continue;
}
}
if (keyEnd >= 0)
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1));
}
I guess you could take your method and expand upon it slightly to deal with this stuff...
Kind of pseudocode:
List<string> parsedTokens = new List<String>();
string[] tokens = myString.split(' ');
for(int i = 0; i < tokens.Length; i++)
{
// We need to deal with the special case of the last item,
// or if the following item does not contain a colon.
if(i == tokens.Length - 1 || tokens[i+1].IndexOf(':' > -1)
{
parsedTokens.Add(tokens[i]);
}
else
{
// This bit needs to be refined to deal with values with multiple spaces...
parsedTokens.Add(tokens[i] + " " + tokens[i+1]);
}
}
Another approach would be to split on the colon... That way, your first array item would be the name of the first key, second item would be the value of the first key and then name of the second key (can use LastIndexOf to split it out), and so on. This would obviously get very messy if the values can include colons, or the keys can contain spaces, but in that case you'd be pretty much out of luck...

Searching a String for a certain thing, then removing up to a certain point to a list

Im working on an Automatic Downloader of sorts for personal use, and so far I have managed to set up the program to store the source of the link provided into a string, the links to the downloads are written in plain text in the source, So what I need to be able to do, is search a string for say "http://media.website.com/folder/" and have it return all occurences to a list? the problem is though, I also need the unique id given for each file after the /folder/" to be stored with each occurence of the above, Any ideas? Im using Visual C#.
Thanks!!!
Steven
Maybe something like this?
Dictionary<string, string> dictionary = new Dictionary<string, string>();
string searchText = "Text to search here";
string textToFind = "Text to find here";
string fileID = "";
bool finished = false;
int foundIndex = 0;
while (!finished)
{
foundIndex = searchText.IndexOf(textToFind, foundIndex);
if (foundIndex == -1)
{
finished = true;
}
else
{
//get fieID, change to whatever logic makes sense, in this example
//it assumes a 2 character identifier following the search text
fileID = searchText.Substring(foundIndex + searchText.Length, 2);
dictionary.Add(fileID, textToFind);
}
}
use Regex to get the matches, that will give you a list of all the matches. Use wildcards for the numeric value that will differ between matches, so you can parse for it.
I'm not great with Regex, but it'd be something like,
Regex.Match(<your string>,#"(http://media.website.com/folder/)(d+)")
Or
var textToFind = "http://media.website.com/folder/";
var ids = from l in listOfUrls where l.StartsWith(textToFind) select new { RawUrl = l, ID=l.Substring(textToFind.Length)}

Fastest way to compare a string with an array of strings in C#2.0

What is the fastest way to compare a string with an array of strings in C#2.0
What kind of comparison do you want? Do you want to know if the given string is in the array?
bool targetStringInArray = array.Contains(targetString);
do you want an array of comparison values (positive, negative, zero)?
var comparisons = array.Select(x => targetString.CompareTo(x));
If you're checking for containment (i.e. the first option) and you're going to do this with multiple strings, it would probably be better to build a HashSet<string> from the array:
var stringSet = new HashSet<string>(array);
if (stringSet.Contains(firstString)) ...
if (stringSet.Contains(secondString)) ...
if (stringSet.Contains(thirdString)) ...
if (stringSet.Contains(fourthString)) ...
You mean to see if the string is in the array? I can't remember if arrays support the .Contains() method, so if not, create a List< string >, add your array to the list via AddRange(), then call list.Contains({string to compare}). Will return a boolean value indicating whether or not the string is in the array.
If you are doing this many times with a single array, you should sort the array and binary search it:
Array.Sort(array);
int index = Array.BinarySearch(array, input);
// if (index < 0)
// does not exists, "items > ~index" are larger and "< ~index" are smaller
// otherwise, "items > index" are larger and "< index" are smaller.
Otherwise just check the whole array naively:
bool exists = Array.IndexOf(array, input) >= 0;
If your requirement to see if one list is a part of another, then you can use Contains().
Lets say
List<string> list1 = new List<string>(){"1", "2"};
List<string> list2 = new List<string>(){"1", "2", "3"};
list2.Contains(list1) //will be True, but not vice versa.
That said, if you want to know not partial match, but exact match, you can do use Except(), and check for remainder.
if(list2.Except(list1).Length == 0) //will return false.
//get data in list from source
List checklist = Directory.GetFiles(SourcePath, ".", SearchOption.AllDirectories).Where(x => x.ToLower().EndsWith("apk")).ToList();
//get date from a text file
List<string> ls = ReadFile();
foreach(string file in checklist)
{
//get file name
string filename = Path.GetFileName(file);
string TargetLocation = Path.Combine(TargetPath, filename);
//now compare single string to a list
//it give in true and false
if(ls.Contains(filename))
{
//do your task
//File.Copy(file, TargetLocation);
}
}

Categories