LinqToExcel: How do I exclude certain rows? - c#

I've been struggling with this for a few days now and I'm stumped. I'm hoping that someone can provide an alternate suggestion.
Basically, I'm reading data from excel using LinqToExcel. But I want to exclude all rows with a "Rating" of "NR". Here's a sample of my data:
CompanyName Rating SalesMan
Apple 2 Steve
Google NR Steve
Microsoft 3 John
Dell 1 Steve
Pepsi 3 John
I just want to find all companies that belong to Steve but doesn't have a rating of "NR". My final list should be:
CompanyName SalesMan
Apple Steve
Dell Steve
I've tried the following code but it doesn't work:
1)
var masterList = masterDataXL.Worksheet("data_all").Where(d => !d["Rating"].Equals("NR"));
2)
var masterList = masterDataXL.Worksheet("data_all")
.Where(m =>
!m["Rating"].Equals("NR")
&&
m["SalesMan"].ToString().Contains(resAnLastName)) // check for last name
.Select(m => new ResAnTicksDataClass
{
Company = m["CompanyName"],
Rating = m["Rating"],
Seller = m["SalesMan"]
}).AsEnumerable();
3) Created a property for Rating and did the following:
var masterList = masterDataXL.Worksheet("data_all")
.Where(m =>
m["Analyst"].ToString().Contains(resAnLastName)) // check for last name
.Select(m => new ResAnTicksDataClass
{
Company = m["CompanyName"],
Rating = m["Rating"],
Seller = m["SalesMan"]
}).AsEnumerable();
var dataList = (from m in masterList
where m.Rating != "NR"
select new ResAnTicksDataClass
{
ResAnName = m.ResAnName,
DescrTick = m.DescrTick
}).AsEnumerable();
I'm open to any other suggestions that you might have because I'm completely stumped. Thank you so much in advance.

I suggest you select the 'Rating' column in your Excel file and do a search & replace on the selection (CHange 'NR' to '0') and then filter. Should help using a single data type.
As phoog said, converting Excel files into a table, that table will need to specify each column's type. To do so, it'll look only the 10 first rows of your Excel file. So if your file doesn't have a 'NR' value in the first 10 rows, it will set the column type to INT, and therefore fail to convert the value 'NR'.
A simple trick to fix this is to add a row to your Excel file, just before your first data row, with the data using the datatype you want to use.
As an example, if a column is using text values and sometimes the text is using over 255 caracters, make sure the first 10 rows have at least 1 text value using 256 caracters. Else, once it creates the table, the column will be set to VARCHAR(255) instead of VARCHAR(MAX) and then crash while converting texts longer than 255 caracters.
Conclusion: always make sure the first 10 rows are using the right type and size to fit all the rows of your Excel file!

In you first sample you should change this:
d => !d["Rating"].Equals("NR")
to this:
d => d["Rating"] != "NR"
It could also be written in a cleaner way:
var masterList =
from d in masterDataXL.Worksheet("data_all")
where d["Rating"] != "NR"
select d;

Related

C# - EPPlus sometimes only returning cells with values, breaking my data gathering

The long story short is I'm trying to convert the data from 3 different Excel documents into 5 separate CSV files, using a combination of data from all of them to do it. 2 of the 3 files are working without issues, but one of the files contains slightly different data - though there are 9 total columns being utilized (41,730 rows), only 3-5 of the columns will have data in each row other than the first (header row). The issue comes up in that it doesn't actually even include the columns without data...so throwing all of the data into an array list has varying numbers of segments in the individual arrays (so I can't associate the data properly).
Here's the code I'm running:
using (ExcelPackage xlPackage = new ExcelPackage(new System.IO.FileInfo(strInputFile)))
{
ExcelWorksheet myWorksheet = xlPackage.Workbook.Worksheets.First();
int totalRows = myWorksheet.Dimension.End.Row;
int totalColumns = myWorksheet.Dimension.End.Column;
for (int rowNum = 1; rowNum <= totalRows; rowNum++)
{
var row = myWorksheet.Cells[rowNum, 1, rowNum, totalColumns].Select(c => c.Value == null ? string.Empty : c.Value.ToString());
listOutput.Add(string.Join("~", row).Split('~'));
}
}
This works perfectly for the others, but in this file the first row has 9 segments, then every subsequent row has 3-5, depending on how many columns have values (the first 2 will always have values, then 1-2 additional columns will in each row). The other files are filling in the blank columns with empty strings, using the lambda in the Select, but I don't know why it's not doing it in this one. All 3 came from the same source (client environment export), and have the same formatting.
Most likely myWorksheet.Cells[rowNum, 1, rowNum, totalColumns] simply isn't returning the cells that don't have values. Try something like this:
var row = Enumerable.Range(1, totalColumns)
.Select(columnNum => myWorksheet.Cells[rowNum, columnNum])
.Select(c => c?.Value?.ToString() ?? string.Empty);

Get the highest value from a given column in a datatable

I have a datatable that has a column "Pending Approval Number". I need to get the highest number in that column to display to a user as the next available. I have seen some examples of how to do this but I have not been able to get them to work. I know i could loop each DataRow in the DataTable and check the value and store it if it is higher than the last. But I know there has to be a better way.
Here is how the DataTable is filled.
strSQL = "Select * from JobInvoice";
DataTable dtApprovalNumber = new DataTable();
MySqlDataAdapter daApprovalNumber = new MySqlDataAdapter(strSQL, conn);
daApprovalNumber.Fill(dtApprovalNumber);
A change to the SQL query or code to pull it from the datatable are both welcome.
EDIT: After getting the solution for my original numeric column, I found the second column that I need to do this for is string. The solution was also provided below.
If You want to get highest Value from DataTable in code not in sql, then You can just use linq like below:
int highestNumber = dtApprovalNumber.AsEnumerable().Max(x => x.Field<int>("SomeIntegerColumn");
EDIT.
According to Your comment - if You want to calculate max value from a string column which holds numbers(don't get it why) You can go with something like that:
int highestNumber = dtApprovalNumber.AsEnumerable().Max(x => int.Parse(x.Field<string>("SomeStringColumn")));
Please Note that if any of those string values is not convertable it will fail then You will have to do it other way.
EDIT.2
Since I've just tried it I'll share with You - the situation when You have string Column and You are not sure if all of them are convertable(for example some might be empty). See below:
int tempVariable;
int highestNumber = dt.AsEnumerable()
.Where(x => int.TryParse(x.Field<string>("SomeColumn"), out tempVariable))
.Max(m => int.Parse(m.Field<string>("SomeColumn")));
Select max(`Pending Approval Number`) from JobInvoice
You can do a maxvalue + 1 to show it as next available number.

Adding a property that sums the results of a query in Lightswitch 2012

I have a project using C# in Lightswitch 2012 that has the following tables:
Clients
Id - Integer
CaseID - Long Integer
FullName - String
Address - String
Tracking - TrackingItem Collection
Staff
Id - Integer
PIN - Integer
FullName - String
Tracking - TrackingItem Collection
Tracking
Id - Integer
Client - Client
Staff - StaffItem
StartDate - Date
StartTime - DateTime
EndTime - DateTime
Units - Double (calculated field)
TogetherTime - Boolean
Relationships are as follows: Each tracking object must have at least one Client and at least one Staff, and each Client and Staff can have many Tracking objects. I currently have a query called TrackingFilter that lets users filter the Tracking table on a search screen called SearchTrackingFilter by client name, staff name, a date range, and whether or not the item is marked as together time. This also displays the calculated field "Units" in the results table. What I am trying to do is add a text field to the screen above the results table that shows the total number of units that the query returned with whatever criteria the user selected. I'm a bit stuck at this point and don't know what to do. I can add labels and such to the screen just fine, but I can't seem to edit any sort of code or anything that would let me add up the total number of units returned by the query. Any help would be appreciated.
i had the same requirement.
You can access the DataWorkspace with this.DataWorkspace.ApplicationData and add your filter parameters to it.
For example:
IDataServiceQueryable<Invoice> queryInvoiceTotal = this.DataWorkspace.ApplicationData.Invoices;
if (drdCustomer != null)
queryInvoiceTotal = queryInvoiceTotal.Where(q => q.Customer.Id == this.drdCustomer.Id);
if (InvoiceState != null)
queryInvoiceTotal = queryInvoiceTotal.Where(q => q.InvoiceState == InvoiceState);
if (InvoiceDateStart != null)
queryInvoiceTotal = queryInvoiceTotal.Where(q => q.Date >= InvoiceDateStart);
if (InvoiceDateEnd != null)
queryInvoiceTotal = queryInvoiceTotal.Where(q => q.Date <= InvoiceDateEnd);
var data = queryInvoiceTotal.Execute();
decimal? totalNetto = data.Sum(q => q.SumNetto);
You could also access the so called VisualCollection query in the screen, but pls note that this would only summarize the data which is currently visible in the screen/grid. So e.g. if you are only showing 45 items per page it would only summarize those 45 items up.

Linq Objects Group By & Sum

I have a datatable with a column "No" and "Total" column. I'd like to bring back the sum of the "Total" column for each "No". I've tried the below, but I'm getting several errors as I'm struggling with the syntax.
var Values =
(from data in DtSet.Tables["tblDetails"].AsEnumerable()
group data by data.Field<"No">
select new
{
name_1 = data.Field<double>("No"),
name_2 = data.Field<double>("Total"),
}
);
This will give you sum of Total fields in name_2 property, and grouping No in name_1 property (I think you need better naming here)
var Values = from row in DtSet.Tables["tblDetails"].AsEnumerable()
group row by row.Field<double>("No") into g
select new
{
name_1 = g.Key,
name_2 = g.Sum(r => r.Field<double>("Total"))
};
Consider about names No and TotalSum instead.
You start using linq, but good old DataTable has it's own way:
var total = DtSet.Tables["tblDetails"].Compute("sum(Total)", "No = x");
I have to leave it with the x in it, because I don't know the values of the "No" column. The part "No = x" is a filter. If it is null or an empty string all rows will be used in the computation.

How to remove a a particular row value from a datatable based on a condition

I have a DataTable like this
name age
------------
kumar 27
kiran 29
anu 24
peter 34
tom 26
manu 35
sachin 37
geetha 23
Now I have another DataTable like this with one column:
name
----
manu
tom
anu
I need to compare the value in the column name here and remove all the rows that share the same name. Now the result output should be like this:
name age
------------
kumar 27
kiran 29
peter 34
sachin 37
geetha 23
How can I achive this result?
Assuming you mean .NET datatables and not SQL tables. One approach is the following. Convert your second datatable to a dictionary, something like this:
// create a lookup table to ease the process
Dictionary<string, string> dict = (from row in removeNames.AsEnumerable()
select (string) row["name"])
.ToDictionary(k => k);
You must loop from end to first, otherwise you get problems (cannot use foreach here, it'll throw an exception).
// loop top to bottom
for(var i = firstDT.Rows.Count - 1; i >= 0; i--)
{
var row = firstDT.Rows[i];
if(dict.Exists((string) row["name"]))
{
firstDT.Remove(row);
}
}
Edit: better solution
Partially inspired by a little discussion with Jerod below, I figured there had to be a way to do this with a LINQ expression that's easy to read and apply. Previously, it didn't work, because I enumerated the DataRowView itself, which, when you try to remove something, will raise an InvalidOperationException, telling you that the collection was modified. Here's what I did, and it works because we use Select to create a copy of the row items:
// TESTED, WORKS
// assuming table contains {name, age} and removeNames contains {name}
// selecting everything that must be removed
var toBeRemoved = from row in table.Select()
join remove in removeNames.AsEnumerable()
on row["name"] equals remove["name"]
select row;
// marking the rows as deleted, this will not raise an InvalidOperationException
foreach (var row in toBeRemoved)
row.Delete();
One way to do it is:
Create a expression to match the rows; i.e. "name='manu' OR name='tom' OR name='anu'"
Execute the expression against the original data table to select the rows to delete.
Loop through and delete all of the matching rows.
Assumptions:table = original data table, table2 = filter data table.
string expression = string.Join( " OR ", table2.AsEnumerable().Select(
row => string.Format( "name='{0}'", row["name"] )).ToArray() );
foreach( DataRow row in table.Select( expression ) )
{
row.Delete();
}
Why don't you make a simple query :
DELETE FROM table1 WHERE table1.name IN (SELECT name FROM table2)
If you know the table that you wish to remove from you could do something like this.
foreach(DataRow rowTable2 in dataSet.Tables["Table2"].Rows)
{
foreach(DataRow rowTable1 in dataSet.Tables["Table1"].Rows)
{
if(rowTable1["NameColumn"].ToString() == rowTable2["NameColumn"].ToString())
{
dataSet.Tables["Table1"].Rows.Remove(rowTable1);
}
}
}
That is a very simple easy way to get what you need done.

Categories