Remove rows where column contains specific text - c#

I want to remove all the rows of the data whose columns contains ? e.g. in around 100 rows for Column Status I am getting value as Unknown?, Error?, InProgress, Done
So , I want to remove all the rows which contains ?
Below are the code I am using
//I am splitting the string on the basis of delimeter ,
var data = from val in UserData
select val.Split(',');
//Below code is not working
var filterdata = from rows in data
where rows.Contains("?")
select rows;

You forgot to invert the contains:
string[] someStringArray = new string[]
{
"\"ABC\" ,\"Error?\",\"OK\"",
"\"DEF\",\"Inprogress\",\"FINE\"",
"1,2,3",
"?,2,3",
"1,?,3",
"4,5,6"
};
//I am splitting the string on the basis of delimeter ,
var data = from val in someStringArray
select val.Split(',');
//Below code is not working
var filterdata = from rows in data
where !rows.Contains("?") // "!" to select the rows WITHOUT "?"
select rows;
foreach (var item in filterdata)
{
foreach (var i in item)
{
Console.Write(i + ",");
}
Console.WriteLine();
}
return;
Result:
"DEF","Inprogress","FINE",
4,5,6,
This code is perfectly working, I think.
Beside this, I doing a wild guess: You're not searching for quesionmarks "?". The "?" is a character which is often shown if the character can't be shown in your expected encoding.
Have a look which number your chars have:
var chars = someStringArray.SelectMany(s => s.Select(c => c));
foreach (var item in chars.GroupBy(g => g.ToString() + " (" + ((int)g) + ")"))
{
Console.WriteLine(item.Key + ": " + item.Count());
}
Real questionmarks have a 63. If not you've got encoding problems..

You wrote:
I want to remove all the rows of the data whose columns contains "?"
You can never change the input sequence using LINQ functions. So you can't remove rows from your original data using LINQ.
What you can do, is use your data to create a new sequence that doesn't contain question marks. If desired, you can replace your original data with the new sequence.
Looking at your code, it seems that UserData is a sequence of strings, of which you expect that these strings contains comma separated values.
You want to split these CSV strings into their columns, but you don't want rows where any of your columns contain "?"
"A,?,B,C" => do not use this one, one of the column values equals "?"
"A,B,C" => use this one, none of the column values equal "?"
"A, Hello?, B" => use this one, although the second column contains a question mark
this second column is not equal to question mark
This is done as follows:
static readonly char[] separatorChars = new char[] {','}
const string questionMark = "?";
static readonly IEqualityComparer<string> comparer =
var rowsWithoutQuestionMarkValues = userData
// Split each line into column values, using comma as separator
.Select(line => line.Split(separatorChar)
// do not use the line if any of the columns equals the question mark
.Where(splitLine => !splitLine.Any(column => column == questionMark));
If your code might be running in a culture where a questionmark might look differently, for instance: "分号", consider using an IEqualityComparer<string>
readonly IEqualityComparer<string> comparer = GetStringComparerForMyCulture();
var result = ...
.Where(splitLine => !splitLine.Any(column => comparer.Equals(column, questionMark));

Related

C# List: Add double quotes when the field is string with LinQ [duplicate]

This question already has answers here:
Writing data into CSV file in C#
(15 answers)
Closed 3 months ago.
I don't think my question is a duplicate because I am not asking how to convert Lists in to CSVs. But:
I am trying to convert a list into a comma-delimited csv file.
However, some fields contain commas and semicolons.
A column will be split into at least two columns when there is a comma.
My codes:
public void SaveToCsv<T>(List<T> listToBeConverted)
{
var lines = new List<string>();
IEnumerable<PropertyDescriptor> props = TypeDescriptor.GetProperties(typeof(T)).OfType<PropertyDescriptor>();
//Get headers
var header = string.Join(",", props.ToList().Select(x => x.Name));
//Add all headers
lines.Add(header);
//LinQ to get all row data and add commas to serperate them
var valueLines = listToBeConverted.Select(row => string.Join(",", header.Split(',').Select(a => row.GetType().GetProperty(a).GetValue(row, null))));
//add row data to List
lines.AddRange(valueLines);
...
}
How do I modify the LinQ statment to add double quotes to the start and the end of the string when it is System.String?
Use property info for this purpose.
void SaveToCsv<T>(List<T> listToBeConverted)
{
var lines = new List<string>();
IEnumerable<PropertyDescriptor> props = TypeDescriptor.GetProperties(typeof(T)).OfType<PropertyDescriptor>();
//Get headers
var header = string.Join(",", props.ToList().Select(x => x.Name));
//Add all headers
lines.Add(header);
//LinQ to get all row data and add commas to serperate them
var valueLines = listToBeConverted.Select(row => string.Join(",", props.Select(x =>
{
var property = row.GetType().GetProperty(x.Name);
if (property.PropertyType == typeof(string))
return $"\"{property.GetValue(row, null)}\"";
return property.GetValue(row, null);
})));
//add row data to List
lines.AddRange(valueLines);
...
}
CSV is Comma-Separated Values, but the separator is based on your default system separator!
Use a system separator for columns and use the end-line for rows.
you can find your system default in this way:
Open Control panel
open clock and region
Click on Aditional Setting
You can see and change the system's default separator

How to search through combobox with a string containing a wildcat?

I have a combo-box that contains lots of entries like this small extract
1R09ST75057
1R11ST75070
1R15ST75086
1R23ST75090
2R05HS75063
2R05ST75063
3R05ST75086
2R07HS75086
The user now enters some information in the form that result in a string being produced that has a wildcat (unknown) character in it at the second character position
3?05ST75086
I now want to take this string and search\filter through the combo-box list and be left with this item as selected or a small set of strings.
If I know the string without the wildcat I can use the following to select it in the Combo-box.
cmbobx_axrs75.SelectedIndex = cmbobx_axrs75.Items.IndexOf("2R05HS75063");
I thought I could first create a small subset that all have the first char the same then make a substring of each minus the first two chars and check this but I can have a large amount of entries and this will take too much time there must be an easier way?
Any ideas how I can do this with the wildcat in the string please?
Added info:
I want to end up with the selected item in the Combobox matching my string.
I choose from items on the form and result in string 3?05ST75086. I now want to take this and search to find which one it is and select it. So from list below
1R05ST75086
2R05ST75086
3R05ST75086
6R05ST75086
3R05GT75086
3R05ST75186
I would end up with selected item in Combo-box as
3R05ST75086
You could use regular expressions. Something like this:
string[] data = new string[]
{
"1R09ST75057",
"1R11ST75070",
"1R15ST75086",
"1R23ST75090",
"2R05HS75063",
"2R05ST75063",
"3R05ST75086",
"2R07HS75086"
};
string pattern = "3*05ST75086";
string[] results = data
.Where(x => System.Text.RegularExpressions.Regex.IsMatch(x, pattern))
.ToArray();
You can use a regular expression for this task. First, you need a method to convert your pattern string to Regex like this (it should handle "*" and "?" wildcards):
private static string ConvertWildCardToRegex(string value)
{
return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}
Then you will use it like the following:
List<string> comboBoxValues = new List<string>()
{
"1R09ST75057",
"1R11ST75070",
"1R15ST75086",
"1R23ST75090",
"2R05HS75063",
"2R05ST75063",
"3R05ST75086",
"2R07HS75086"
};
string searchPattern = "3?05ST75086";
string patternAsRegex = ConvertWildCardToRegex(searchPattern);
var selected = comboBoxValues.FirstOrDefault(c => Regex.IsMatch(c, patternAsRegex));
if (selected != null)
{
int selectedIndex = comboBoxValues.IndexOf(selected);
}
This assumes you only care about first found match. If you need all matches then substitute FirstOrDefault(...) with Where(...) clause and swap "if" statement with a foreach loop.
Thanks to all that helped I used a combination of items from all answers so everyone helped me answer this.
I added this function from the answers as it seems a good idea, thanks
private static string ConvertWildCardToRegex(string value)
{
return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}
Then I get the combo box items into a list. I search the list and make some more decisions based on the result of the search.
List<string> comboBoxValues = new List<string>();
for (int i = 0; i < cmbobx_in_focus.Items.Count; i++)
{
comboBoxValues.Add(cmbobx_in_focus.GetItemText(cmbobx_in_focus.Items[i]));
}
string[] results = comboBoxValues
.Where(x => Regex.IsMatch(x, ConvertWildCardToRegex(lbl_raster_used.Text)))
.ToArray();
I now have array called results which is easy to work with.

Constructing a insert query based on a LINQ query

I am trying to build a sql insert query based on the number of filled out textboxes returned from a LINQ query. Basically, the start of the textboxes start at the tab index number 13 and ends at the tab index number 33 and adds the non empty textboxes to a keyvaluepair list. The issue that I am confused about is how to add the filled out textboxes' values to named parameters inside the insert query without having a error of number of query values and destination fields are not the same. Here is the code I have in place:
// use LINQ to fetch all the children textboxes based on the ones that are not empty
System.Collections.Generic.Dictionary<string, string> dictionary = tabCtrl1.TabPages["tabPage1"].Controls.OfType<TextBox>()
.Where(t => t.TabIndex >= startTabIndex && t.TabIndex <= endTabIndex && !string.IsNullOrWhiteSpace(t.Text))
.Select(x => new System.Collections.Generic.KeyValuePair<string, string>(x.Name, x.Text))
.ToDictionary(z => z.Key, z => z.Value);
// loop through all the children textboxes
// and assign them to the list members.childTextBoxes
foreach (System.Collections.Generic.KeyValuePair<string, string> kvp in dictionary)
{
members.childTextBoxes.Add(new System.Collections.Generic.KeyValuePair<string, string>($"{kvp.Key}", $"{kvp.Value}"));
}
and then constructing of the insert query:
for (int i = 0; i < members.childTextBoxes.Count; i++)
{
using (members.DBCommand = new System.Data.OleDb.OleDbCommand("INSERT INTO children (pid, childName, birthday, childEmail)" +
"VALUES (" + lastInsertId + ", #" + members.childTextBoxes[i].Key + ")", members.DBConnection))
{
members.DBCommand.Parameters.AddWithValue("#" + members.childTextBoxes[i].Key, members.childTextBoxes[i].Value);
}
// error occurs here. i'm assuming its to do
if (members.DBCommand.ExecuteNonQuery() > 0)
{
MessageBox.Show("Records inserted", "QBC", MessageBoxButtons.OK);
}
}
}
I hope this is enough information that describes the problem I am confused about.
Any help would be appreciated.
Thanks!
I can try to add more information if it helps make my question more clear.
update-
I went ahead and tried to use .Add instead of .AddWithValue but unfortunately that kept giving me an insert into query syntax error.
Here is the updated code for the insert query builder:
string fieldList = $"{string.Join(",", members.childTextBoxes.Select(tb => mapToDatabase[tb.Key]))}";
string valueList = $"{string.Join(",", members.childTextBoxes.Select(tb => "?"))}";
string insertQuery = $"INSERT INTO children {fieldList} VALUES {valueList}";
using (members.DBCommand = new System.Data.OleDb.OleDbCommand(insertQuery, members.DBConnection))
{
foreach (var field in members.childTextBoxes)
{
members.DBCommand.Parameters.Add("#" + field.Key, OleDbType.LongVarChar).Value = field.Value;
}
if (members.DBCommand.ExecuteNonQuery() > 0) // error occurs here
{
MessageBox.Show("Records inserted", "QBC", MessageBoxButtons.OK);
}
}
The best I can tell from the code you are showing in your question, you would want something like this:
var controlMap = new Dictionary<string, string>();
controlMap.Add(nameof(txtChildName), "ChildName");
controlMap.Add(nameof(txtParentName), "ParentName");
string fieldList = $"({string.Join(",", childTextBoxes.Select(tb => controlMap[tb.Key]))})" ;
string valueList = $"({string.Join(",", childTextBoxes.Select(tb => "?"))})";
string insertStatement = $"INSERT INTO children {fieldList} VALUES {valueList}";
var command = new OleDbCommand(insertStatement, members.DBConnection);
foreach (var field in childTextBoxes)
{
command.Parameters.Add(field.Value);
}
OleDbCommand doesn't support named parameters, so you have to use positional ones. They are marked with an "?", and added in the order they are used.
You also need to build both the field list, and the values list in the SQL insert statement, so that that order of fields in your field list matches the order that the "?" markers will be populated when you add the parameters.
I haven't been able to test this, since I don't have your full setup, but it should get you pretty close. It assumes that childTextBoxes is declared as List<KeyValuePair<string, string>> since I can't see your actual declaration. You may have to adjust that a bit if it isn't correct.
fixed the error, I had to enclose the insert values with parenthesis.
string insertQuery = $"INSERT INTO children (pid, {fieldList}) VALUES (?,{valueList})";
that fixed it.

c# Compare 2 CSV files and delete if it exists in second file

Basically i want to delete a row from List.csv if it exists in the ListToDelete.csv and output the results to a different file named newList.csv.
List.csv
1,A,V
2,B,W
3,C,X
4,D,Y
5,E,z
ListToDelete.csv
3
4
NewList.csv
1,A,V
2,B,W
5,E,z
I understand about using streamreader and writer to read and write to files but i can't see how to store only the first column of List.csv to compare it to the 1st column of ListToDelete.csv.
I initially stripped out everything in the first column using the split method to do the comparison but i also need to copy over the other 2 columns and i can't see how to compare or loop through it correctly.
string list = "List.txt";
string listDelete = "ListToDelete.txt";
string newList = "newList.txt";
//2 methods to store all the text in a string array so we can match the arrays. Using ReadAllLines instead of screenreader so it populates array automatically
var array1 = File.ReadAllLines(list);
var array2 = File.ReadAllLines(listDelete);
// Sets all the first columns from the CSV into an array
var firstcolumn = array1.Select(x => x.Split(',')[0]).ToArray();
//Matches whats in firstcolumn and array 2 to find duplicates and non duplicates
var duplicates = Array.FindAll(firstcolumn, line => Array.Exists(array2, line2 => line2 == line));
var noduplicates = Array.FindAll(firstcolumn, line => !Array.Exists(duplicates, line2 => line2 == line));
//Writes all the non duplicates to a different file
File.WriteAllLines(newList, noduplicates);
So that above code produces
1
2
5
But i also need the second and third columns to be written to a new file to look like
NewList.csv
1,A,V
2,B,W
5,E,z
You had almost done it right. The problem is because noduplicates is selected from firstcolumn, which is only the first column {1,2,3,4,5}. noduplicates should be selected from the original list (array1), excluding the lines that start with one of the duplicates.
Correct one single line as following should fix the problem. The output has 3 rows and each row has 3 columns.
var noduplicates = Array.FindAll(array1, line => !Array.Exists(duplicates, line2 => line.StartsWith(line2)));
Furthermore, you don't need to parse the first column from the original array for matching. The code can be cleaned up like this
string list = "List.csv";
string listDelete = "ListToDelete.csv";
string newList = "newList.txt";
var array1 = File.ReadAllLines(list);
var array2 = File.ReadAllLines(listDelete);
var noduplicates = Array.FindAll(array1, line => !Array.Exists(array2, line2 => line.StartsWith(line2)));
//Writes all the non duplicates to a different file
File.WriteAllLines(newList, noduplicates);

Adding and Couting items in a list

Let's say I have a file that looks like this:
R34 128590 -74.498 109.728 0 0805_7
R33 128590 -74.498 112.014 0 0805_7
R15 128588 -68.910 127.254 0 0805_7
R32 128587 -65.354 115.189 0 0805_7
R35 128587 -65.354 117.348 0 0805_7
R38 128590 -65.354 119.507 0 0805_7
What I want to do is add the 2nd column to a list and have a counter count how many times that item occurs and outputs it with the number and then the counted amount of that number.
Is there a way to do this using a List? If so, how could I go about doing that?
I have tried messing around with things and this is where I was heading.. but it does not work properly
int lineCount = 1;
int itemCounter = 0;
foreach (var item in aListBox.Items)
{
// Creates a string of the items in the ListBox.
var newItems = item.ToString();
// Replaces any multiple spaces (tabs) with a single space.
newItems = Regex.Replace(newItems, #"\s+", " ");
// Splits each line by spaces.
var eachItem = newItems.Split(' ');
###
### HERE is where I need help ###
###
List<string> partList = new List<string>();
partList.Add(eachItem[1]);
if (partList.Contains(eachItem[1]))
itemCounter++;
else
partList.Add(eachItem[1]);
sw.WriteLine(lineCount + ": "+ partList + ": " + itemCounter);
lineCount++;
}
SO for the example above, it would output this:
1: 128590: 3 #lineCount, partList, itemCounter
2: 128588: 1
3: 128587: 2
Can someone help me figuring out how to properly do this?
use linq with count and group by (see Count- Grouped section).
create your partList outside the foreach loop and add each item to it inside the loop , so that it would contain all of the elements:
List<string> partList = new List<string>();
foreach (var item in aListBox.Items)
{
//regex stuff here...
partList.Add(eachItem[1]);
}
(in your example- {128590, 128590, 128588, 128587, 128587, 128590})
and then use LINQ to output the result-
var elementsWithCounts = from p in partList
group p by p into g
select new { Item = g.Key, Count = g.Count()};
I would either use a Linq query or a Dictionary
something like
List<string> items = new List<string>{"128590", "128590", "128588", "128587", "128587", "128590"};
Dictionary<string,int> result = new Dictionary<string,int>();
foreach( int item in items )
{
if(result.ContainsKey(item) )
result[item]++;
else
result.Add(item,1);
}
foreach( var item in result )
Console.Out.WriteLine( item.Key + ":" + item.Value );
Once you have the items split by space, I'm assuming you have a string array looking like so:
[0] = "R34"
[1] = "128590"
[2] = "-74.498"
[3] = "109.728"
[4] = "0"
[5] = "0805_7"
You can simply perform this operation with a Group By operation.
var items = aListBox.Items.Select(x => /* Split Code Here and Take Element 1 */).GroupBy(x => x);
foreach(var set in items)
{
Console.WriteLine(set.Key + " appeared " + set.Count() + " times.");
}
Basically, you are trying to do this by iterating once, and that is not really going to work, you are going to have to iterate twice, otherwise you will wind up doing an output every time you loop in the foreach, and even if your accurate you are going to be outputting a new line each time. If you need to really use a List instead of a keyed dictionary or hashtable which would be idea for this (key = number, value = count), then you need to build the list first, then summarize the list. You can either use LINQ Group By (which is a bit terse), or create a function that does something similar to what you already have. If you are trying to learn concepts, look at the code below, it could be more condensed but this should be fairly easy to read.
List<string> partList = new List<string>();
List<string> displayedNumbers = new List<int>();
// Build the original list first.
foreach (var item in aListBox.Items)
{
// Creates a string of the items in the ListBox.
var newItems = item.ToString();
// Replaces any multiple spaces (tabs) with a single space.
newItems = Regex.Replace(newItems, #"\s+", " ");
// Splits each line by spaces.
var eachItem = newItems.Split(' ');
partList.Add(eachItem[1]);
}
// Now run through that list and count how many times the same number occurs.
// You will need two loops for this since your list is a single dimension collection.
foreach(var number in partList)
{
var innerList = partList;
// set this to zero because we are going to find at least 1 duplicate.
var count = 0;
foreach(var additionalNumber in innerList)
{
if(additionalNumber == number)
{
// If we find anymore increase the count each time.
count += 1;
}
}
// Now we have the full count of duplicates of the outer number in the list.
// If it has NOT been displayed, display it.
if(!displayedNumbers.Contains(number))
{
sw.WriteLine(partList + ": " + count);
displayedNumbers.Add(number);
}
}
Use a hash table instead ofa list. You can save the key as 128590,... and the value the number of times it has occurred.
Before you insert the new value check if it is already present in the hashtable by using the Contains operation and if it is increment the value.
I think the biggest problem is getting from raw lines of your text field to individual values. My guess is this is a tab-delimited file with a known constant number of columns, in which case you could use String.Split() to separate the sub-strings. Once you have the strings separated, you can count the instances of the proper column pretty easily with a little LINQ. Given a list or collection of your file's lines:
var histogram = myListOfLines
//Split each string along spaces or tabs, and discard any zero-length strings
//caused by multiple adjacent delimiters.
.Select(s=>s.Split(new[]{'\t',' '}, StringSplitOptions.RemoveEmptyEntries))
//Optional; turn the array of strings produced by Split() into an anonymous type
.Select(a=>new{Col1=a[0], Col2=a[1], Col3=a[2], Col4=a[3], Col5=a[4]})
//Group based on the values of the second column.
.GroupBy(x=>x.Col2)
//Then, out of the grouped collection, get the count for each unique value of Col2.
.Select(gx=>new{gx.Key, gx.Count()});

Categories