Replace empty string with null in array efficiently

Replace empty string with null in array efficiently - c#

I want to know the most efficient way of replacing empty strings in an array with null values.
I have the following array:
string[] _array = new string [10];
_array[0] = "A";
_array[1] = "B";
_array[2] = "";
_array[3] = "D";
_array[4] = "E";
_array[5] = "F";
_array[6] = "G";
_array[7] = "";
_array[8] = "";
_array[9] = "J";
and I am currently replacing empty strings by the following:
for (int i = 0; i < _array.Length; i++)
{
if (_array[i].Trim() == "")
{
_array[i] = null;
}
}
which works fine on small arrays but I'm chasing some code that is the most efficient at doing the task because the arrays I am working with could be much larger and I would be repeating this process over and over again.
Is there a linq query or something that is more efficient?

You might consider switching _array[i].Trim() == "" with string.IsNullOrWhitespace(_array[i]) to avoid new string allocation. But that's pretty much all you can do to make it faster and still keep sequential. LINQ will not be faster than a for loop.
You could try making your processing parallel, but that seems like a bigger change, so you should evaluate if that's ok in your scenario.
Parallel.For(0, _array.Length, i => {
if (string.IsNullOrWhitespace(_array[i]))
{
_array[i] = null;
}
});

As far as efficiency it is fine but it also depends on how large the array is and the frequency that you would be iterating over such arrays. The main problem I see is that you could get a NullReferenceException with your trim method. A better approach is to use string.IsNullOrEmpty or string.IsNullOrWhiteSpace, the later is more along the lines of what you want but is not available in all versions of .net.
for (int i = 0; i < _array.Length; i++)
{
if (string.IsNullOrWhiteSpace(_array[i]))
{
_array[i] = null;
}
}

LINQ is mainly used for querying not for assignment. To do certain action on Collection, you could try to use List. If you use List instead of Array, you could do it with one line instead:
_list.ForEach(x => string.IsNullOrWhiteSpace(x) ? x = null; x = x);

A linq query will do essentially the same thing behind the scenes so you aren't going to gain any real efficiency simply by using linq.
When determining something more efficient, look at a few things:
How big will your array grow?
How often will the data in your array change?
Does the order of your array matter?
You've already answered that your array might grow to large sizes and performance is a concern.
So looking at options 2 and 3 together, if the order of your data doesn't matter then you could keep your array sorted and break the loop after you detect non-empty strings.
Ideally, you would be able to check the data on the way in so you don't have to constantly loop over your entire array. Is that not a possibility?
Hope this at least gets some thoughts going.

It's ugly, but you can eliminate the CALL instruction to the RTL, as I mentioned earlier, with this code:
if (_array[i] != null) {
Boolean blank = true;
for(int j = 0; j < value.Length; j++) {
if(!Char.IsWhiteSpace(_array[i][j])) {
blank = false;
break;
}
}
if (blank) {
_array[i] = null;
}
}
But it does add an extra assignment and includes an extra condition and it is just too ugly for me. But if you want to shave off nanoseconds off a massive list then perhaps this could be used. I like the idea of parallel processing and you could wrap this with Parallel.

Use the below code
_array = _array.Select(str => { if (str.Length == 0) str = null; return str; }).ToArray();

Related

How to compare a fixed value against multiple values and find if any one fails to match

I have a fixed int value - 1050. I have around 50 dynamic values that I want to compare with the fixed value. So I compare it in a for loop. I have a public variable which I set as ok or notok depending on result. But my problem is that the value of the public variable is always the last value that I compared. Example, If I have the 20th dynamic value as 1000, it should return notok, but the value of the variable is always the last compared value. How do I set the variable to notok even if one/multiple of the values of dynamic variable doesnt match with fixed variable? I also display the total number of notok values in a listbox.
Here is what I have:
string result;
for(int i = 0; i < dynamicvalue.count; i++)
{
if(dynamicvalue[i] != setvalue)
{
result = "notok";
listBox1.Items.Add(result);
}
else
{
result = "ok";
}
}

To have "notok" if theres at least one not matching, one way to do it in plain code:
string result = "ok";
for(int i=0; i<dynamicvalue.count; ++i)
{
if(dynamicvalue[i] != setvalue)
{
result = "notok";
break;
}
}

You can use .Any() from Linq,
Determines whether any element of a sequence exists or satisfies a
condition.
string result = dynamicvalue.Any(x => x == setValue) ? "Ok" : "Not Ok";
If you want to use for loop without a break statement, you are just increasing the time complexity of your code.
I will never recommend it, but if you want you can try the below code
string result = "Ok";
bool flag = true;
//This for loop will iterate for n times.
for(int i = 0; i < dynamicvalue.Count; i++)
{
if(dynamicvalue[i] != setvalue && flag)
{
result = "Not Ok";
flag = false; //flag will help us to execute this block of code only once.
}
}

Perhaps the most performant way to answer this would be to keep your numbers in a HashSet instead (make dynamicvalue a HashSet<int>), then it's:
dynamicvalue.Contains(setvalue) ? "ok" : "notok"
A HashSet can much more quickly answer "do you contain this value?" than a list/array can

By the discussion going on in the comments I'm thinking that you want to go through all the elements in dynamicvalue and check all if any of them are ok or notok. If that is the case, you should turn result into an array. You get the last compared result because each time the cycle loops, the string gets assigned a new value all over again so the previous value gets discarded.
Is this what you want to do? I wrote it in c++
int setvalue = 1050;
int notok = 0;
int dynamicvalue[5] = {1, 2, 3, 1050, 4}; //for example
string result[5];
for (int i = 0; i < sizeof(dynamicvalue); i++){
if (dynamicvalue[i] != setvalue){
result[i] = "notok";
notok++; //to keep track of notok
}
else{
result[i] = "ok";
}
}
Afterwards if you cycle through the result array you will see that all the values were saved. I find it simpler to have an int variable to know how many times the result was notok

You forgot to get the actual value within dynamicvalue: your test should be if (dynamicvalue[i] != setvalue).
EDIT: And add a break; after the result="ok"; instruction to break the loop, too.
EDIT 2: An above answer gives the corrected code using a break.

I found a solution to this by reading #deminalla 's answer.
I added two more integers to work as counters and after the for loop I compare the values of these integers to get the final result.
Here's what I did:
string result;
int okcounter = 0;
int notokcounter = 0;
for(int i = 0; i < dynamicvalue.count; i++)
{
if(dynamicvalue[i] != setvalue)
{
notokcounter ++;
listBox1.Items.Add(notokcounter);
}
else
{
okcounter++;;
}
}
if(notokcounter >=1)
{
result = "notok";
}
else if(okcounter == dynamicvalue.count)
{
result = "ok";
}

Can be certain items in an array be compared for something to be true - C#

i have been having trouble getting this code to work is there something wrong with it or am i just doing it wrong
string a = teams[1];
string b = wins[1];
int numWins = 0;
while (o < wins.Length)
{
if (a != b)
{
numWins++;
}
o++;
}
numOfWinsLabel.Text = numWins.ToString();
it is adding to the counter when both of them are equal in the txt files that i have set
Can someone plz help me?

You seem to want to walk through the arrays, and you seem to try to use the o variable to point to each element. But you're not "telling" the arrays to actually use the o variable.
What happens in that case is that the variables get assigned the values of index 1 and they never change. It's also worth noting that arrays usually start at index 0.
Try the following:
int numWins = 0;
for (int o = 0; o < wins.Length; o++)
{
string a = teams[o];
string b = wins[o];
if (a != b)
{
numWins++;
}
}
numOfWinsLabel.Text = numWins.ToString();
This of course also assumes that the teams has at least as much elements as the wins array, or you'll get an exception.
I've changed the while to a for as it's more suitable for situations where you already know the number of times you want to loop.

How do I check for duplicate answers in this array? c#

Sorry for the newbie question. Could someone help me out? Simple array here. What's the best/easiest method to check all the user input is unique and not duplicated? Thanks
private void btnNext_Click(object sender, EventArgs e)
{
string[] Numbers = new string[5];
Numbers[0] = txtNumber1.Text;
Numbers[1] = txtNumber2.Text;
Numbers[2] = txtNumber3.Text;
Numbers[3] = txtNumber4.Text;
Numbers[4] = txtNumber5.Text;
foreach (string Result in Numbers)
{
lbNumbers.Items.Add(Result);
}
txtNumber1.Clear();
txtNumber2.Clear();
txtNumber3.Clear();
txtNumber4.Clear();
txtNumber5.Clear();
}
}
}
I should have added I need to check to happen before the numbers are output. Thanks

One simple approach is via LINQ:
bool allUnique = Numbers.Distinct().Count() == Numbers.Length;

Another approach is using a HashSet<string>:
var set = new HashSet<string>(Numbers);
if (set.Count == Numbers.Count)
{
// all unique
}
or with Enumerable.All:
var set = new HashSet<string>();
// HashSet.Add returns a bool if the item was added because it was unique
bool allUnique = Numbers.All(text=> set.Add(text));
Enunmerable.All is more efficient when the sequence is very large since it does not create the set completely but one after each other and will return false as soon as it detects a duplicate.
Here's a demo of this effect: http://ideone.com/G48CYv
HashSet constructor memory consumption: 50 MB, duration: 00:00:00.2962615
Enumerable.All memory consumption: 0 MB, duration: 00:00:00.0004254
msdn
The HashSet<T> class provides high-performance set operations.
A set is a collection that contains no duplicate elements, and whose
elements are in no particular order.

The easiest way, in my opinion, would be to insert all values inside a set and then check if its size is equal to the array's size. A set can't contain duplicate values, so if any value is duplicate, it won't be inserted into the set.
This is also OK in complexity if you don't have millions of values, because insertion in a set is done in O(logn) time, so total check time will be O(nlogn).
If you want something optimal in complexity, you can do this in O(n) time by going through the array, and putting each value found into a hash map while incrementing its value: if value doesn't exist in set, you add it with count = 1. If it does exist, you increment its count.
Then, you go through the hash map and check that all values have a count of one.

If you are just trying to make sure that your listbox doesn't have dups then use this:
if(!lbNumbers.Items.Contains(Result))
lbNumbers.Items.Add(Result);

What about this:
public bool arrayContainsDuplicates(string[] array) {
for (int i = 0; i < array.Length - 2; i++) {
for (int j = i + 1; j < array.Length - 1; j++) {
if (array[i] == array[j]) return true;
}
}
return false;
}

Initializing an integer array

string dosage = "2/3/5 mg";
string[] dosageStringArray = dosage.Split('/');
int[] dosageIntArray = null;
for (int i = 0; i <= dosageStringArray.Length; i++)
{
if (i == dosageStringArray.Length)
{
string[] lastDigit = dosageStringArray[i].Split(' ');
dosageIntArray[i] = Common.Utility.ConvertToInt(lastDigit[0]);
}
else
{
dosageIntArray[i] = Common.Utility.ConvertToInt(dosageStringArray[i]);
}
}
I am getting the exception on this line: dosageIntArray[i] = Common.Utility.ConvertToInt(dosageStringArray[i]);
I am unable to resolve this issue. Not getting where the problem is. But this line int[] dosageIntArray = null; is looking suspicious.
Exception: Object reference not set to an instance of an object.

The biggest problem with your solution is not the missing array declaration, but rather how
you'd parse the following code:
string dosage = "2/13/5 mg";
Since your problem is surely domain specific, this may not arise, but some variation of two digits representing same integer.
The following solution splits the string on forward slash, then removes any non-digits from the substrings before converting them to integers.
Regex digitsOnly = new Regex(#"[^\d]");
var array = dosage.Split('/')
.Select(num => int.Parse(digitsOnly.Replace(num, string.Empty)))
.ToArray();
Or whatever that looks like with the cuddly Linq synthax.

You are looking for something like
int[] dosageIntArray = new int[dosageStringArray.Length];

You are trying to access a null array (dosageIntArray) here:
dosageIntArray[i] = Common.Utility.ConvertToInt(lastDigit[0]);
You need to initialize it before you can access it like that.

You have to allocate dosageIntArray like this:
in[] dosageIntArray = new int[dosageStringArray.Length];
Also, you have another bug in your code:
Index of last element of an array is Length - 1.
Your for statement should read as:
for (int i = 0; i < dosageStringArray.Length; i++)
or
for (int i = 0; i <= (dosageStringArray.Length - 1); i++)
The former is preferred and is the most common style you will see.

I strongly recommend you use Lists instead of Arrays. You don't need to define the size of the List; just add items to it. It's very functional and much easier to use.

As alternative approach:
var dosage = "2/3/5 mg";
int[] dosageIntArray = Regex.Matches(dosage, #"\d+")
.Select(m => int.Parse(m.Value))
.ToArray();

Testing for repeated characters in a string

I'm doing some work with strings, and I have a scenario where I need to determine if a string (usually a small one < 10 characters) contains repeated characters.
`ABCDE` // does not contain repeats
`AABCD` // does contain repeats, ie A is repeated
I can loop through the string.ToCharArray() and test each character against every other character in the char[], but I feel like I am missing something obvious.... maybe I just need coffee. Can anyone help?
EDIT:
The string will be sorted, so order is not important so ABCDA => AABCD
The frequency of repeats is also important, so I need to know if the repeat is pair or triplet etc.

If the string is sorted, you could just remember each character in turn and check to make sure the next character is never identical to the last character.
Other than that, for strings under ten characters, just testing each character against all the rest is probably as fast or faster than most other things. A bit vector, as suggested by another commenter, may be faster (helps if you have a small set of legal characters.)
Bonus: here's a slick LINQ solution to implement Jon's functionality:
int longestRun =
s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();
So, OK, it's not very fast! You got a problem with that?!
:-)

If the string is short, then just looping and testing may well be the simplest and most efficient way. I mean you could create a hash set (in whatever platform you're using) and iterate through the characters, failing if the character is already in the set and adding it to the set otherwise - but that's only likely to provide any benefit when the strings are longer.
EDIT: Now that we know it's sorted, mquander's answer is the best one IMO. Here's an implementation:
public static bool IsSortedNoRepeats(string text)
{
if (text.Length == 0)
{
return true;
}
char current = text[0];
for (int i=1; i < text.Length; i++)
{
char next = text[i];
if (next <= current)
{
return false;
}
current = next;
}
return true;
}
A shorter alternative if you don't mind repeating the indexer use:
public static bool IsSortedNoRepeats(string text)
{
for (int i=1; i < text.Length; i++)
{
if (text[i] <= text[i-1])
{
return false;
}
}
return true;
}
EDIT: Okay, with the "frequency" side, I'll turn the problem round a bit. I'm still going to assume that the string is sorted, so what we want to know is the length of the longest run. When there are no repeats, the longest run length will be 0 (for an empty string) or 1 (for a non-empty string). Otherwise, it'll be 2 or more.
First a string-specific version:
public static int LongestRun(string text)
{
if (text.Length == 0)
{
return 0;
}
char current = text[0];
int currentRun = 1;
int bestRun = 0;
for (int i=1; i < text.Length; i++)
{
if (current != text[i])
{
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = text[i];
}
currentRun++;
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Now we can also do this as a general extension method on IEnumerable<T>:
public static int LongestRun(this IEnumerable<T> source)
{
bool first = true;
T current = default(T);
int currentRun = 0;
int bestRun = 0;
foreach (T element in source)
{
if (first || !EqualityComparer<T>.Default(element, current))
{
first = false;
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = element;
}
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Then you can call "AABCD".LongestRun() for example.

This will tell you very quickly if a string contains duplicates:
bool containsDups = "ABCDEA".Length != s.Distinct().Count();
It just checks the number of distinct characters against the original length. If they're different, you've got duplicates...
Edit: I guess this doesn't take care of the frequency of dups you noted in your edit though... but some other suggestions here already take care of that, so I won't post the code as I note a number of them already give you a reasonably elegant solution. I particularly like Joe's implementation using LINQ extensions.

Since you're using 3.5, you could do this in one LINQ query:
var results = stringInput
.ToCharArray() // not actually needed, I've left it here to show what's actually happening
.GroupBy(c=>c)
.Where(g=>g.Count()>1)
.Select(g=>new {Letter=g.First(),Count=g.Count()})
;
For each character that appears more than once in the input, this will give you the character and the count of occurances.

I think the easiest way to achieve that is to use this simple regex
bool foundMatch = false;
foundMatch = Regex.IsMatch(yourString, #"(\w)\1");
If you need more information about the match (start, length etc)
Match match = null;
string testString = "ABCDE AABCD";
match = Regex.Match(testString, #"(\w)\1+?");
if (match.Success)
{
string matchText = match.Value; // AA
int matchIndnex = match.Index; // 6
int matchLength = match.Length; // 2
}

How about something like:
string strString = "AA BRA KA DABRA";
var grp = from c in strString.ToCharArray()
group c by c into m
select new { Key = m.Key, Count = m.Count() };
foreach (var item in grp)
{
Console.WriteLine(
string.Format("Character:{0} Appears {1} times",
item.Key.ToString(), item.Count));
}

Update Now, you'd need an array of counters to maintain a count.
Keep a bit array, with one bit representing a unique character. Turn the bit on when you encounter a character, and run over the string once. A mapping of the bit array index and the character set is upto you to decide. Break if you see that a particular bit is on already.

/(.).*\1/
(or whatever the equivalent is in your regex library's syntax)
Not the most efficient, since it will probably backtrack to every character in the string and then scan forward again. And I don't usually advocate regular expressions. But if you want brevity...

I started looking for some info on the net and I got to the following solution.
string input = "aaaaabbcbbbcccddefgg";
char[] chars = input.ToCharArray();
Dictionary<char, int> dictionary = new Dictionary<char,int>();
foreach (char c in chars)
{
if (!dictionary.ContainsKey(c))
{
dictionary[c] = 1; //
}
else
{
dictionary[c]++;
}
}
foreach (KeyValuePair<char, int> combo in dictionary)
{
if (combo.Value > 1) //If the vale of the key is greater than 1 it means the letter is repeated
{
Console.WriteLine("Letter " + combo.Key + " " + "is repeated " + combo.Value.ToString() + " times");
}
}
I hope it helps, I had a job interview in which the interviewer asked me to solve this and I understand it is a common question.

When there is no order to work on you could use a dictionary to keep the counts:
String input = "AABCD";
var result = new Dictionary<Char, int>(26);
var chars = input.ToCharArray();
foreach (var c in chars)
{
if (!result.ContainsKey(c))
{
result[c] = 0; // initialize the counter in the result
}
result[c]++;
}
foreach (var charCombo in result)
{
Console.WriteLine("{0}: {1}",charCombo.Key, charCombo.Value);
}

The hash solution Jon was describing is probably the best. You could use a HybridDictionary since that works well with small and large data sets. Where the letter is the key and the value is the frequency. (Update the frequency every time the add fails or the HybridDictionary returns true for .Contains(key))

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Replace empty string with null in array efficiently - c#

LINQ is mainly used for querying not for assignment. To do certain action on Collection, you could try to use List. If you use List instead of Array, you could do it with one line instead: _list.ForEach(x => string.IsNullOrWhiteSpace(x) ? x = null; x = x);

Use the below code _array = _array.Select(str => { if (str.Length == 0) str = null; return str; }).ToArray();

Related

How to compare a fixed value against multiple values and find if any one fails to match

Can be certain items in an array be compared for something to be true - C#

How do I check for duplicate answers in this array? c#

Initializing an integer array

Testing for repeated characters in a string

Categories

Resources