Combining Two Lists Based on first 6 Characters - c#

I have two lists (Current & New). Current can contain 100,000+ strings each starting with a unique number. New can contain anywhere between 50 and 200 strings each with a unique number.
If New contains a string starting with the same 6 characters it should replace the same entry in Current. Any new entries that don't exist in Current but exist in New should be added to Current. I've considered Union, Concat and Intersect, but each only deal with the entire string.
Is there some way to compare just the first 6 characters of an item in a list and replace the entry in Current if found it exists in New?
Perhaps the easiest way to visualise the above is:
Current
123456 66 Park Avenue Sydney
New
123456 88 River Road Sydney
The result in Current needs to be
123456 88 Park Avenue Sydney
If Current.Union(New, first X characters) was possible it would be perfect.
Any suggestions on Combine the two lists without duplicates based on the first 6 characters would be greatly appreciated.

string.StartsWith is what you are looking for.

This should do it. Note I have used two dictionaries because there might be duplicates to be replaced, if not, you can use one.
public static void Coder42(List<string> current, IEnumerable<string> news)
{
Dictionary<string, string> newDict = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
Dictionary<string, string> unfound = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
foreach (var n in news)
{
if (n.Length < 6) throw new Exception("Too short");
var ss = n.Substring(0, 6);
if (newDict.ContainsKey(ss)) throw new Exception("Can't be too new.");
newDict[ss] = n;
unfound[ss] = n;
}
for (int i = 0; i < current.Count; i++)
{
var s = current[i];
if (s.Length >= 6)
{
var ss = s.Substring(0, 6);
if (newDict.TryGetValue(ss, out string replacement))
{
current[i] = replacement;
unfound.Remove(ss);
}
}
}
foreach(var pair in unfound)
current.Add(pair.Value);
}
And tested using:
var current = new List<string>();
current.Add("123456 a");
current.Add("123457 b");
current.Add("123458 c");
var news = new List<string>();
news.Add("123457 q");
news.Add("123456 p");
news.Add("123459 z");
Coder42(current, news);
foreach (var s in current) Console.WriteLine(s);
Console.ReadLine();
Gives:
123456 p
123457 q
123458 c
123459 z

Related

How do I display a list and a list of lists side by side

I found a code that displays two lists side by side but a list and a list of lists no luck
this is the code of two lists side by side
for (var i = 0; i < bncount; i++)
{
//Console.WriteLine(String.Format("{0,-10} | {1,-10}", hed.ElementAt(i),bin.ElementAt(i)));
Console.WriteLine(String.Format("{0,-10} | {1,-10}", i< hdcount ? hed[i] : string.Empty, i< bncount ? bin[i] : string.Empty));
}
but the string.empty is for lists only and not list of lists and ElementAt() also wouldn't work
I tried using linq with foreach but no success
the hed is a list of strings and bn is a list of lists of a sequence of numbers
my output are as follows
foreach(var r in bin) //0010 1110 1111
foreach(var m in hed) //red blue white
I want to have the following output
red 0010
blue 1110
white 1111
or
red blue white
0 1 1
0 1 1
1 1 1
0 0 1
Any Idea on how to do this in c# in general or in Linq? the methods I tried either resulted on reprinting one value only froma hed and all the vlaues from bin or the opposite
Not sure if I understood the question correctly, I think it would be helpful to extend the code examples including the variables definition. Anyway, if I understood correctly, this would me my approach:
var listOfString = new List<string>( )
{
"red",
"blue",
"white"
};
var listOfArrays = new List<int[]>( )
{
new int[] { 0,0,1,0 },
new int[] { 0,1,1,1 },
new int[] { 1,1,1,1 }
};
// Here you could add a condition in case you are not 100% sure your arrays are of same length.
for( var i = 0; i < listOfString.Count; i++ )
{
var stringItem = listOfString[i];
var arrayItem = listOfArrays[i];
Console.WriteLine( $"{stringItem} {string.Join( null, arrayItem )}" );
}
I would suggest using a different structure for storing your data (considering OOP principles) and the following code for printing the data out:
public class Item
{
public string Name { get; set; }
public List<int> Values { get; set; }
}
public void Print(List<Item> items)
{
foreach (var item in items)
{
Console.WriteLine($"{item.Name} {string.Join("", item.Values)}");
}
}
The first version is not so hard:
string reuslt = string.Join("\n", bin.Zip(hed).Select(x => $"{x.Item1} {x.Item2}"));
With zip, we create an enumerable of tuples, where the n-th tuple has the n-th element of bin and the n-th element of hed. You can just concatenate those two items.
The second version is a bit more complex:
result = string.Join("\t",hed) + "\n" +
string.Join("\n",Enumerable.Range(0, bin.First().Length)
.Select(x => string.Join("\t\t", bin.Select(str => str[x]))));
We create the heading line by just joing the hed strings. Then we create an enumerable of numbers which represent the indexes in the string. The enumerable will be 0, 1, 2, 3. Then we take the char with index 0 of each string of the bin list, then the char with index 1 of each string of the bin list and so on.
Online demo: https://dotnetfiddle.net/eBQ54N
You can use a dictionary:
var hed = new List<string>(){"red", "blue", "white"};
var bin = new List<string>(){ "0010", "1110", "1111" };
Dictionary<string, string> Dic = new Dictionary<string, string>();
for (int i = 0; i < hed.Count; i++)
{
Dic.Add(hed[i],bin[i]);
}
foreach (var item in Dic)
{
Console.WriteLine(item.Key+" "+item.Value);
}
Console.ReadKey();

C# looping through a list to find character counts

I'm trying to loop through a string to find the character, ASCII value, and the number of times the character occurs. So far, I have found each unique character and ASCII value using foreach statements, and finding if the value was already in the list, then don't add it, otherwise add it. However I'm struggling with the count portion. I was thinking the logic would be "if I am already in the list, don't count me again, however, increment my frequency"
I've tried a few different things, such as trying to find the index of the character it found and adding to that specific index, but i'm lost.
string String = "hello my name is lauren";
char[] String1 = String.ToCharArray();
// int [] frequency = new int[String1.Length]; //array of frequency counter
int length = 0;
List<char> letters = new List<char>();
List<int> ascii = new List<int>();
List<int> frequency = new List<int>();
foreach (int ASCII in String1)
{
bool exists = ascii.Contains(ASCII);
if (exists)
{
//add to frequency at same index
//ascii.Insert(1, ascii);
//get { ASCII[index]; }
}
else
{
ascii.Add(ASCII);
//add to frequency at new index
}
}
foreach (char letter in String1)
{
bool exists = letters.Contains(letter);
if (exists)
{
//add to frequency at same index
}
else
{
letters.Add(letter);
//add to frequency at new index
}
}
length = letters.Count;
for (int j = 0; j<length; ++j)
{
Console.WriteLine($"{letters[j].ToString(),3} {"(" + ascii[j] + ")"}\t");
}
Console.ReadLine();
}
}
}
I'm not sure if I understand your question but that what you are looking for may be Dictionary<T,T> instead of List<T>. Here are examples of solutions to problems i think you trying to solve.
Counting frequency of characters appearance
Dictionary<int, int> frequency = new Dictionary<int, int>();
foreach (int j in String)
{
if (frequency.ContainsKey(j))
{
frequency[j] += 1;
}
else
{
frequency.Add(j, 1);
}
}
Method to link characters to their ASCII
Dictionary<char, int> ASCIIofCharacters = new Dictionary<char, int>();
foreach (char i in String)
{
if (ASCIIofCharacters.ContainsKey(i))
{
}
else
{
ASCIIofCharacters.Add(i, (int)i);
}
}
A simple LINQ approach is to do this:
string String = "hello my name is lauren";
var results =
String
.GroupBy(x => x)
.Select(x => new { character = x.Key, ascii = (int)x.Key, frequency = x.Count() })
.ToArray();
That gives me:
If I understood your question, you want to map each char in the provided string to the count of times it appears in the string, right?
If that is the case, there are tons of ways to do that, and you also need to choose in which data structure you want to store the result.
Assuming you want to use linq and store the result in a Dictionary<char, int>, you could do something like this:
static IDictionary<char, int> getAsciiAndFrequencies(string str) {
return (
from c in str
group c by Convert.ToChar(c)
).ToDictionary(c => c.Key, c => c.Count());
}
And use if like this:
var f = getAsciiAndFrequencies("hello my name is lauren");
// result: { h: 1, e: 3, l: 3, o: 1, ... }
You are creating a histogram. But you should not use List.Contains as it gets ineffective as the list grows. You have to go through the list one item after another. Better use Dictionary which is based on hashing and you go directly to the item. The code may look like this
string str = "hello my name is lauren";
var dict = new Dictionary<char, int>();
foreach (char c in str)
{
dict.TryGetValue(c, out int count);
dict[c] = ++count;
}
foreach (var pair in dict.OrderBy(r => r.Key))
{
Console.WriteLine(pair.Value + "x " + pair.Key + " (" + (int)pair.Key + ")");
}
which gives
4x (32)
2x a (97)
3x e (101)
1x h (104)
1x i (105)
3x l (108)
2x m (109)
2x n (110)
1x o (111)
1x r (114)
1x s (115)
1x u (117)
1x y (121)

Can I represent a given word (String) as a number?

Suppose I have a list of words e.g.
var words = new [] {"bob", "alice", "john"};
Is there a way to represent each of those words as numbers so that one could use such numbers to sort the words.
One use-case which I think this can be used for is to use Counting Sort to sort a list of words. Again I am only interested in whether this is at all possible not that it may not be the most efficient way to sort a list of words.
Do note this is not about hash-codes or different sorting algorithms. I am curious to find out if a string can be represented as a number.
You can use a dictionary instead of an array.
public class Program
{
static void Main(string[] args)
{
IDictionary<int, string> words = new Dictionary<int, string>();
words.Add(0, "bob");
words.Add(1, "alice");
words.Add(2, "john");
foreach (KeyValuePair<int, string> word in words.OrderBy(w => w.Key))
{
Console.WriteLine(word.Value);
}
Console.ReadLine();
}
}
NOTE: It's better to work with collections in place of arrays is easier to use for most developers.
I don't understand the down votes but hey this is what I have come up with so far:
private int _alphabetLength = char.MaxValue - char.MinValue;
private BigInteger Convert(string data)
{
var value = new BigInteger();
var startPoint = data.Length - 1;
for (int i = data.Length - 1; i >= 0; i--)
{
var character = data[i];
var charNumericValue = character;
var exponentialWeight = startPoint - i;
var weightedValue = new BigInteger(charNumericValue * Math.Pow(_alphabetLength, exponentialWeight));
value += weightedValue;
}
return value;
}
Using the above to convert the following:
var words = new [] {"bob", "alice", "john" };
420901224533 // bob
-9223372036854775808 // alice
29835458486206476 // john
Despite the overflow the output looks sorted to me, I need to improve this and test it properly but at least it is a start.

C# Find list string element that suffix of them is greater than others

I have a list string:
["a1","b0","c0","a2","c1","d3","a3"].
I want to get a list ["a3","d3","c1","b0"] base on suffix of them.
Example: "a1","a2","a3" . Result of them is "a3".
This question may be simple but I can't solve.
Thanks for any help!
Following Linq statement does what you need.
var result= input.Select(x=> new {letter = x[0], number = x[1], item=x}) // Separate letter & number.
.GroupBy(x=>x.letter) // Group on letter and take first element (of max number)
.Select(x=> x.OrderByDescending(o=>o.number).First())
.OrderByDescending(x=>x.number) // Order on number.
.Select(x=>x.item) // get the item.
.ToArray();
Output
[
a3
,
d3
,
c1
,
b0
]
Check this Example
Below is an alternative, its quite long mainly because I try to explain every line
// create list based on your original text
var list = new List<string> { "a1", "b0", "c0", "a2", "c1", "d3", "a3" };
// use a dictionary to hold the prefix and max suffixes
var suffixMaxDictionary = new Dictionary<string, int>();
// loop through the list
for (int i = 0; i < list.Count; i++)
{
// get the prefix using Substring()
var prefix = list[i].Substring(0, 1);
// if the prefix already exist in the dictionary then skip it, it's already been processed
if (suffixMaxDictionary.ContainsKey(prefix))
continue; // continue to the next item
// set the max suffix to 0, so it can be checked against
var suffixMax = 0;
// loop through the whole list again to get the suffixes
for (int j = 0; j < list.Count; j++)
{
// get the current prefix in the second loop of the list
var thisprefix = list[j].Substring(0, 1);
// if the prefixes don't match, then skip it
// e.g. prefix = "a" and thisprefix = "b", then skip it
if (prefix != thisprefix)
continue;
// get the suffix
// warning though, it assumes 2 things:
// 1. that the second character is a number
// 2. there will only ever be numbers 0-9 as the second character
var thisSuffix = Convert.ToInt32(list[j].Substring(1, 1));
// check the current suffix number (thisSuffix) compared the suffixMax value
if (thisSuffix > suffixMax)
{
// if thisSuffix > suffixMax, set suffixMax to thisSuffix
// and it will now become the new max value
suffixMax = thisSuffix;
}
}
// add the prefix and the max suffix to the dictionary
suffixMaxDictionary.Add(prefix, suffixMax);
}
// print the values to the console
Console.WriteLine("original: \t" + string.Join(",", list));
Console.WriteLine("result: \t" + string.Join(",", suffixMaxDictionary));
Console.ReadLine();
See also https://dotnetfiddle.net/BmvFEp, thanks #Hari Prasad, I didn't know there was a fiddle for .net
This will give you the first instance of the largest "suffix" as described in the question:
string[] test = { "a3", "d3", "c1", "b0" };
string testResult = test.FirstOrDefault(s => s.Last<char>() == s.Max(t => s.Last<char>()));
In this case the result is "a3"

Replacing data from a list with its position

I have made some code that allows the user to enter a sentence into a richtextbox, and then the data will be saved into a list and have the duplicates removed.
What I want to know is how I would make the overwrite the words in the list with their positions, and then replace the positions of the original sentence with those positions.
E.g: in the sentence Hello this is a test I hope this test works the sentence will be saved, removed of duplicates, and output hello, this, is, a, test, I, hope, works, the code replaces this with 1 2 3 4 5 6 7 8 (I think).
Now I need to make the program replace the actual words in the list with its position in the original so it will finally say 1 2 3 4 5 6 7 2 5 8 separated by commas.
This is my code:
string sentence = richTextBox1.Text;
list = sentence.Split(delimiterChars).ToList();
listoriginal = sentence.Split(delimiterChars).ToList();
listBox1.Items.Add("Full sentence: " + String.Join(" ", list));
list = list.Distinct(StringComparer.InvariantCultureIgnoreCase).ToList();
listBox1.Items.Add("Words in the input: " + String.Join(", ", list));
for (int i = 0; i < list.Count; i++)
{
list[i] = list[i].ToString();
listoriginal[i] = listoriginal[i].ToString();
resultList = listoriginal.Select(x => x.Replace(listoriginal[i], list[i])).ToList();
i++;
}
listBox1.Items.Add("Final result: " + String.Join(", ", resultList));
Create a dictionary with the words that have already been addressed. Meaning that you would have the following:
Dictionary<int, string> words = new Dictionary<int, string>();
words.Items.Add(//number, //word);
(Hello, 1)
(this, 2)
(is, 3)
(a, 4)
.....and so on
You can then write a a method that will search to see if a word is already stored, for example:
foreach(KeyValuePair<int, string> set in words)
{
if(set.value == //whatever word is next)
{
//write the number in the dictionary here
//the corresponding number can be grabbed using: set.key
}
}
Basically you need to populate on the fly a Dictionary<string, int> holding the result position of the word. The result words then can be obtained from Keys property:
var originalWords = sentence.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
var uniqueWordPositions = new Dictionary<string, int>(StringComparer.InvariantCultureIgnoreCase);
var originalWordPositions = new List<int>();
foreach (var word in originalWords)
{
int position;
if (!uniqueWordPositions.TryGetValue(word, out position))
uniqueWordPositions.Add(word, position = uniqueWordPositions.Count + 1);
originalWordPositions.Add(position);
};
listBox1.Items.Add("Full sentence: " + string.Join(" ", originalWords));
listBox1.Items.Add("Words in the input: " + string.Join(", ", uniqueWordPositions.Keys));
listBox1.Items.Add("Final result: " + string.Join(", ", originalWordPositions));
The easiest way is probably to use a key/value store like a dictionary - once you have eliminated the duplicates you can push the key/values into the dictionary and use that to re-build the sentence.
// Create dictionary
var dict = new Dictionary<string, int>();
// When looping through the words to determine index add each word to the dict - the word being the key and the value being the index
dict.Add(word, index);
Then
foreach(var word in words) {
// Get the value from the dictionary for the associated key (the key being the word)
var index = dict[word];
// do stuff with index
}
You can use string concatenation to re-build the string rather than replace (since you are indexing all words).
Common advice will be to use StringBuilder when you are concerned about performance/memory since strings are immutable
var sb = new StringBuilder();
foreach(var word in words) {
sb.Append(dict[word]);
sb.Append(" ");
}
sb.ToString();
Edit:
Here's a more full example...
var sentence = "hello world this is a test hello world";
var words = sentence.Split(' ');
var distinctWords = words.Distinct(StringComparer.InvariantCultureIgnoreCase);
var dict = new Dictionary<string, int>();
var ix = 0;
foreach (var word in distinctWords)
{
dict.Add(word.ToLower(), ix++);
}
var sb = new StringBuilder();
foreach (var word in words)
{
sb.Append(dict[word.ToLower()]);
sb.Append(" ");
}
// sb.ToString();
// 0 1 2 3 4 5 0 1
Obviously since I'm not doing a replace, this may affect the formatting of the original string but it gives you an idea. You can use replace but it will be a lot slower - but it depends on the length of string and how much processing you are doing.

Categories