Reading some rows in Excel - c#

I have some problems when I try to read specific rows of an Excel document.
My implementation is here: Reading Excel in c# where some columns are empty.
How you can see, the entire Excel is taken into a DataTable.
Now I want get a specific range of n value:
I think is the DataTable the problem... I think that maybe I should to obtain another DataTable... maybe using another different query?

It seems that you are using a fixed template so you can solve using different queries on different ranges.
To select a particular range you can use this query:
string query = "SELECT * FROM [YourSheet$B58:D70]";
If you know where a range starts but not the number of rows you can use this syntax:
string query = "SELECT * FROM [YourSheet$B58:D]";
Using HDR=NO in your connection string and changing starting row you could use this query to simplify your next operations:
SELECT [F1] AS Compagnia,
[F2] AS Agenzia,
[F3] AS DataSinistro
FROM [YourSheet$B59:D]
Remember you can also use WHERE to filter your results or exclude empty rows; i.e.:
SELECT [F1] AS Compagnia,
[F2] AS Agenzia,
[F3] AS DataSinistro
FROM [YourSheet$B59:D]
WHERE [F3] IS NOT NULL

Related

How to use FETCH in OleDb query?

I have xlsx table like this:
Name SubDatasetCount Parameter1 Parameter2 ParameterX .......
Dataset1
SubDataset1
SubDataset2
SubDatasetX
Dataset2
SubDataset1
SubDataset2
SubDatasetX
.
.
.
My goal is to load any Dataset Parameters and all its SubDatasets.
Xlsx format and reading method is given. At this moment I read Data1-SubDataCount and then I try to run following SQL query for OleDbReader:
SELECT *
FROM ["SheetName"$]
WHERE Name LIKE '%DatasetName%'
FETCH NEXT [SubDatasetCount] ROWS ONLY
It cause OleDbException: 'IErrorInfo.GetDescription failed with E_FAIL(0x80004005).' . Prior addition of FETCH query worked fine. I have no SQL knowledge, I copied it from here: How to select next rows from database in C#?
In linked answer there is statement that ORDER BYis a MUST, but I can not do that obviously.
And even when I tested following query, error is same:
SELECT *
FROM ["SheetName"$]
WHERE Name LIKE '%DatasetName%'
ORDER BY Name
FETCH NEXT 10 ROWS ONLY
It works when I remove FETCH and leave ORDER BY. Quick study of that specific error yields always same result - reserved keyword is used in the query. But I don't see anything like that in FETCH part of query.
How do I make FETCH work?
In case FETCH is fixed somehow, how to solve ORDER BY requirement? ORDER BY(SELECT NULL) cause exception.

Trouble accessing data from DataTable

I've got a DataTable the I'm trying to access the DataRow row by row like so:
dataTable.Select("someID=" + someID.ToString()).CopyToDataTable().Rows.Count;
This works fine for someID of 0-9, but when I get to 10 I get a System.InvalidOperationException. In Visual Studio DataTable Visualizer I can see someID as one of the columns with data of 0-24, so 10 should be there.
When I use the ImmediateWindow and look at dataTable.Select("someID=10") I get
{System.Data.DataRow[0]} and looking at dataTable.Select("someID=9") gives me
{System.Data.DataRow[1]}
What am I missing?
Well, why do you need the CopyToDataTable() method when all you need is the count of matches? You could simply use the Length or Count, isn't it?
x = dataTable.Select("someID=" + someID.ToString()).Length;

Find each string in a list from a table column

I have a table that has about 1 million rows. One of the columns is a string, let's call it column A.
Now I need to work on a list L of about 1,000 strings, mostly one or two words, and I need to find all the records in the table where column A contains one of the 1,000 strings in the list L.
The only way I can think of is to use each string in L to do a full table scan, find if the string is a substring of column A content of each row. But that will be O(n2), and for a million rows it will take a very long time.
Is there a better way? Either in SQL or in C# code?
One million rows is a relatively small number these days. You should be able to pull all strings from column A, along with your table's primary key, into memory, and do a regex search using a very long regex composed from your 1000 strings:
var regex = new Regex("string one|string two|string three|...|string one thousand");
Since regex gets compiled into a final automaton, you would get reasonably fast scanning times for your strings. Once your filtering is complete, collect the IDs, and query full rows from the table using them.
The best way to do is is using linq. Lets say that you have your list
List<string> test = new List<string>{"aaa","ddd","ddsc"};
then using Linq you can constract
var match = YourTable.Where (t=> test.Contains(t.YourFieldName);
I suggest looking into full text search, it won't decrease the count of the operations you have to perform but it will increase the performance.
Assuming you use Sql server (you should always use the relevant tag to specify the rdbms),
you can create a DataTable from your List<string> and send it to a stored procedure as a table valued parameter.
Inside the stored procedure you can use a simple join of that table valued parameter to your table on database_table.col contains(table_parameter.value) (using full text search).
Of course, things will go a lot faster if you create a full text index as suggested in the comments by Glorfindel

Compare Two Datatable that has same schema and N no of columns

I have two DataTables both has same no of columns and column names.Am in need of comparing both for the different rows.Which means even if one cell doesnt match the row should be plotted.I tried with
table1.Merge(table2);
DataTable modified = table2.GetChanges();
But this is returning null.
Where as
IEnumerable<DataRow> added = table1.AsEnumerable().Except(table2.AsEnumerable());
This is returning the table1 values alone even there are different values for a some cells in table1 compared to table2.
Can anyone help me for this comparison.Various sites i referred,the instruction said was to compare each column in a row but since i have N no of columns i cant go with that.I need a smarter way of comparison which would be efficient.
Thanks in advance
IEnumerable<DataRow> added = table1.AsEnumerable().Except(table2.AsEnumerable());
should be changed to
IEnumerable<DataRow> added = table1.AsEnumerable().Except(table2.AsEnumerable(),DataRowComparer.Default);
Because DataRows don't know how to compare themselves to eachother on their own. You can also provide your own equality delegate instead of DataRowComparer.Default if required.
Have you tried using merge? http://msdn.microsoft.com/en-us/library/fk68ew7b.aspx
Datatable1.Merge(datatable2);
DataTable DataTable3 = Datatable2.GetChanges();

Is it possible to remove a duplicate value from a datatable in c#?

I have a datatable with a column MobileNo.... My datatable has 150 rows and each row has MobileNo... Now how to check every MobileNo is unique by parsing all the datarows of the datatable in c#?
EDIT:
"This datatable is being created by reading a CSV file"
Use Linq and Group By the MobileNo then you will need to traverse the collection and see which MobileNo's have multiple records and then do whatever you wish to remove what you deem is duplicated.
Edit: From Linq 101 Samples.
Try
DataTable.DefaultView.ToTable(bool distinct, string[] ColumnNames)
the third overload on that is what your looking for I believe, let me know how you get on.
Specify which columns are to be deduped in the string[]
And a true false for distinct records
So you can either just select a subset of data out or dedupe by setting distinct to true.
You will need to do some jiggery pokery to get your dataset how you want it, but I think thats what your after.

Categories