Error importing data with FoxPro OLEDB driver - c#

I am importing some data from a FoxPro database to a Sql Server database using the FoxPro OLE-DB driver. The approach I am taking is to loop through the FoxPro tables, select all records into a DataTable and then use SqlBulkCopy to insert that table into Sql Server. This works fine except for a few instances where I get the following error:
System.InvalidOperationException: The provider could not determine the Decimal value. For example, the row was just created, the default for the Decimal column was not available, and the consumer had not yet set a new Decimal value.
I have investigated this and logged which rows it appears with and the issue is that the FoxPro table has a fixed width for a numeric value. 1 is stored as 1.00 however 10 is stored as 10.0 and it is the single digit after the decimal point which is causing the issues. Having now found the issue I am struggling to fix it though. The following function is what I am using to convert an OLEDBReader to a DataTable:
private DataTable FPReaderToDataTable(OleDbDataReader dr, string TableName)
{
DataTable dt = new DataTable();
//get datareader schema
DataTable SchemaTable = dr.GetSchemaTable();
List<DataColumn> cols = new List<DataColumn>();
if (SchemaTable != null)
{
foreach (DataRow drow in SchemaTable.Rows)
{
string columnName = drow["ColumnName"].ToString();
DataColumn col = new DataColumn(columnName, (Type)(drow["DataType"]));
col.Unique = (bool)drow["IsUnique"];
col.AllowDBNull = (bool)drow["AllowDBNull"];
col.AutoIncrement = (bool)drow["IsAutoIncrement"];
cols.Add(col);
dt.Columns.Add(col);
}
}
//populate data
int RowCount = 1;
while (dr.Read())
{
DataRow row = dt.NewRow();
for (int i = 0; i < cols.Count; i++)
{
try
{
row[((DataColumn)cols[i])] = dr[i];
}
catch (Exception ex) {
if (i > 0)
{
LogImportError(TableName, cols[i].ColumnName, RowCount, ex.ToString(), dr[0].ToString());
}
else
{
LogImportError(TableName, cols[i].ColumnName, RowCount, ex.ToString(), "");
}
}
}
RowCount++;
dt.Rows.Add(row);
}
return dt;
}
What I would like to do is check for values that have the 1 decimal place issue but I am unable to read from the datareader at all in these cases. I would have thought that I could have used dr.GetString(i) on the offending rows however this then returns the following error:
The provider could not determine the String value. For example, the row was just created, the default for the String column was not available, and the consumer had not yet set a new String value.
I am unable to update the FoxPro data as the column does not allow this, how can I read the record from the DataReader and fix it? I have tried all combinations of casting / dr.GetValue / dr.GetData and all give variations on the same error.
The structure of the FoxPro table is as follows:
Number of data records: 1664
Date of last update: 11/15/10
Code Page: 1252
Field Field Name Type Width Dec Index Collate Nulls Next Step
1 AV_KEY Numeric 6 Asc Machine No
2 AV_TEAM Numeric 6 No
3 AV_DATE Date 8 No
4 AV_CYCLE Numeric 2 No
5 AV_DAY Numeric 1 No
6 AV_START Character 8 No
7 AV_END Character 8 No
8 AV_SERVICE Numeric 6 No
9 AV_SYS Character 1 No
10 AV_LENGTH Numeric 4 2 No
11 AV_CWEEKS Numeric 2 No
12 AV_CSTART Date 8 No
** Total ** 61
It is the av_length column which is causing the problem

I dont know if you have access to getting Visual Foxpro, but it has an upsizing "wizard" that will allow uploading directly to SQL Server.
It looks like a free download for trial at MS via Download Visual Foxpro 9, SP2
it may be an issue with memo / blob type columns that are not getting properly interpretted.

You mentioned type-casting, but not sure how you've attempted it... In your try/catch where you have
row[((DataColumn)cols[i])] = dr[i];
you might want to explicitly test the columns data type and FORCE it... something like (not positive of the object reference for DataType.ToString() below, but you'll have to find that during your running / debugging.
if( cols[i].DataType.ToString().ToLower().Contains( "int" ))
row[((DataColumn)cols[i])] = (int)dr[i];
else
row[((DataColumn)cols[i])] = dr[i];
You could obviously test for other types too...

From your listed structure of the table, that IS CORRECT what it is doing. In VFP for the table structure listed, the AV_LENGTH is of type numeric, length of 4, 2 being allocated for decimal positons. So it will at MOST have a value of "9.99". VFP forces the input of the numeric field to a maximum of 2 decimal positions, 1 for decimal point and the rest as whole number portion.
The rest of the numeric based fields are Numeric with a length, but NO decimal positions which indicates they are all WHOLE numbers with no decimal position hence would qualify as integer data types. Numeric with decimal should go into a float or double column type.
That being said, I don't know HOW you are even getting a 10.0 value in a numeric 4, 2 decimal. This is the FIRST time I've ever seen forcing a number larger than the allocated intent of the structure being saved actually be stored in the field like this.

I don't recall the reason why FoxPro has this problem. I think it has something to do with how numbers are stored. Regardless of that, the solution is either (A) clean up the data or (B) re-size the field to allow a larger value. The sample code below demonstrates the problem.
* create a table that can store a value between -0.99 and 99.99
CREATE TABLE "TEST.DBF" (av_length N(4,2))
* insert values between 1.10 and 22,222.22222
INSERT INTO "TEST" (av_length) VALUES(1.1)
INSERT INTO "TEST" (av_length) VALUES(2.2)
INSERT INTO "TEST" (av_length) VALUES(11.11)
INSERT INTO "TEST" (av_length) VALUES(22.22)
INSERT INTO "TEST" (av_length) VALUES(111.111)
INSERT INTO "TEST" (av_length) VALUES(222.222)
INSERT INTO "TEST" (av_length) VALUES(1111.1111)
INSERT INTO "TEST" (av_length) VALUES(2222.2222)
INSERT INTO "TEST" (av_length) VALUES(11111.11111)
INSERT INTO "TEST" (av_length) VALUES(22222.22222)
* view the contents of the table
* note that records 3 to 10 do not match the field definition
BROWSE NORMAL
IF MESSAGEBOX("Fix the Data? Select to Change the Field Definition", 0+4+32) = 6
* Solution A: fix the data, and view the table contents again
REPLACE ALL av_length WITH MIN(av_length, 9.99) IN "TEST"
BROWSE NORMAL
ELSE
* Solution B: change the field definition, and view the table contents again
* note that records 9 & 10 still need to be fixed
ALTER TABLE "TEST.DBF" ALTER COLUMN av_length N(12,6)
BROWSE NORMAL
ENDIF

Related

c#: Oracle/OracleDataReader adds additional '0' after decimal point if amount of comma numbers equals scale

I found some strange behavior when retrieving data from my Oracle database. I've got a table field defined as NUMBER(20,3).
var ordinal = 100;
var decimalValue = _reader[ordinal]; //100.1550
OracleDecimal decVal = (_reader as DataAccess.Client.OracleDataReader)?.GetOracleDecimal(ordinal);
string decString = decVal.toString(); //100.155
Console.WriteLine(decVal.Value); //100.1550
While retrieving a value from OracleDataReader with a value like 100.10 it works fine (toString will return 100.10). But when the amount of comma numbers equals the scale(3) like 100.155 an additional 0 will be added making the value 100.1550. In my perception this additional 0 should not be added due to the scale, but the fact is it does. Perhaps the value is stored as 110.1550 in Oracle?
My question is, why is this 0 added while the scale is 3 and how can I retrieve this value correctly from the database?

Print all the columns of a DB table, without writing each column name in the print statement

I have a table of students in SQL Server and I want to execute a query like
SELECT *
FROM tbl_students;
but I don't want to write each column number getValue(0) getValue(1) in C# to get the result,
I wrote in the following statement
Console.WriteLine("{0},\t{1}", sqlDReader.GetValue(0), sqlDReader.GetValue(1));
I just want to get all the column values without writing each column index number, can't we simply get a string of the complete record, preferably with spaces of tabs in between?
You can. Assuming you are using c#, something like this would work
If you know the number of columns:
string completeLine = "";
for(int i = 0 ; i < numCols ; i++)
{
completeLine += sqlDReader.GetValue(0).ToString();
if (i < numCols - 1)
completeLine += " ";
}
Console.WriteLine (completeLine);
This assumes all columns can be cast to a string. Also assumes you know number of columns. There's a bunch of more complex ways to do this.
To get number of rols you can get the .Columns.Count or similar.
Then have an outer loop, for each row. Note: do not try to do this type of string concatenation on the entire table as it will be slow (since string are really not changeable).

Check if any rows in column in datatable cannot convert from integer to string

I have a DataTable which holds scores in one column. The problem is that the score can be an integer or a string, for example, 15 (integer score), or "Advanced" (string score). I don't know what type of scores the column is going to contain, but I want to make a decision after I check whether any score in the rows of the column contains a string that cannot convert to an integer. I am using two datatables, one that has the unknown scores (dt) and one that will be filled with all the data and scores after I do certain operations (dtCloned). I want to convert the datatype of the score column in dtCloned before I copy in all the scores by doing these checks: What I want to do is if I have this column with these scores:
ScoreValue Column
Advanced
Basic
Proficient
Below Basic
If any scores in this column cannot convert to an integer (in this case none can) then I do not want to convert the column to an integer datatype, otherwise I do. More examples:
Scorevalue column
1
2
3
4
This would pass the test because all the values can convert to an integer. It would proceed and convert the column to integer datatype.
ScoreValue column
0
Advanced
Proficient
This would not pass because there are string values that cannot convert to an integer.
This is what I have so far that just checks if the score in row 10 column 13 (no reason for row 10 just don't know a better way to do this, and column 13 is the score column) can convert to an integer.
int number;
bool tryConvert = Int32.TryParse(Convert.ToString(dt.Rows[10][13]), out number);
if (tryConvert)
dtCloned.Columns[13].DataType = typeof(Int32);
foreach (DataRow row in dt.Rows)
{
dtCloned.ImportRow(row);
}
So, is there any way for me to not have to hard code a specific row number for a check. Ideally if any row in the column has a string value that cannot convert to an integer do not convert the column data type to integer. I know that LINQ has a method "Any". Would that work in this situation?
In your case, All will be your friend, as like:
int number;
if(dt.Rows.Cast<DataRow>().All(x => int.TryParse(Convert.ToString(x[13]), out number)))
dtCloned.Columns[13].DataType = typeof(Int32);
Hope this helps...

Padding 0's - MySQL

So, I have a column that is my key column and auto-increments, so it can't be varchar or anything fun.
Please hold back the "Erhmahgerd urse werb contrerls" as I like to control my own HTML flow and don't like handing it over to .NET. I've never had good experiences with that (and I like my code to be compliant). I wouldn't like this to be a flame war or anything - I just want to pad with zeroes. I feel the need to say this because it's happened way too many times before.
So, anyway.
DataTable tabledata = new DataTable();
using (OdbcConnection con = new OdbcConnection(conString)) {
using (OdbcCommand com = new OdbcCommand("SELECT * FROM equipment_table", con)) {
OdbcDataAdapter myAdapter = new OdbcDataAdapter();
myAdapter.SelectCommand = com;
try {
con.Open();
myAdapter.Fill(tabledata);
} catch (Exception ex) {
throw (ex);
} finally {
con.Close();
}
}
}
Response.Write("<table id=\"equipment_listtable\"><thead><tr><th>Equipment ID</th><th>Equipment Name</th><th>Equipment Description</th><th>Type</th><th>In Use?</th><th>iOS Version (if applicable)</th><th>Permission Level</th><th>Status</th><th>Asset Tag</th><th>Details</th><th>Change</th><th>Apply</th></tr></thead>");
foreach (DataRow row in tabledata.Rows) {
int counter = (int)row["Equipment_ID"];
Response.Write("<tr>");
foreach (var item in row.ItemArray) {
Response.Write("<td>" + item + "</td>");
}
Response.Write("This stuff is irrelevant to my problem, so it is being left out... It uses counter though, so no complaining about not using variables...");
}
Response.Write("</table>");
As you can imagine, the value of my key column comes out like so in the generated table:
1
10
11
12
13
14
15
16
17
18
19
20
2
21
etc. I'd like to fix this with 0 padding. What is the best way to do this? Is there a way to target a SPECIFIC field while I'm generating the table? I've looked into DataRow.Item, but I've always found the MSDN documentation to be a bit difficult to comprehend.
Alternatively, could I SELECT * and then use mysql's lpad on ONE specific field within the *?
Thanks!
SELECT * is generally not a good idea to use. It inevitably causes more problems than the time it saves by writing the query.
This will allow you to use a LPAD on the column.
I was about to suggest using something like:
Response.Write("" + item.ToString.PadLeft(2, '0')+ "");
But since you are just looping round each item and rendering them all the same way, the above would pad every cell.
So I think your best option is to change your query to specify every column. Then you can pad the field as you want.
Or use an ORDER BY if you are only concerned they aren't being ordered correctly (ie, ordered as chars not ints).
alternatively, create a variable for each cell read from the database and render each seperately.
this will give you more customisation options, should you requite them.
You really should always specify your column names explicitly and not use * anyway - see here.
If you insist on using * then just bring the padded value in as another field:
SELECT *,LPAD("Equipment_ID", 2, '0') as Equipment_ID_Padded FROM equipment_table
Remember LPAD will truncate if your Equipment_ID is longer than 2 digits.
A better solution may be to just pad the values in code using String.Format or ToString("D2");
string paddedString = string.Format("{0:d2}", (int)row["Equipment_ID"]));
You can add padding in C# by using .ToString("D" + number of leading zeros);
eg. if counter = 34 and you call counter.ToString("D5"), you'll get 00034.
If you're using strings, the easiest way would be to convert.toInt32() and then apply the above.
If you'd rather keep using strings, just look into --printf whups wrong language-- String.Format.

SQL huge selection of IDs - How to make it faster?

I have an array with a huge amounts of IDs I would like to select out from the DB.
The usual approach would be to do select blabla from xxx where yyy IN (ids) OPTION (RECOMPILE).
(The option recompile is needed, because SQL server is not intelligent enough to see that putting this query in its query cache is a huge waste of memory)
However, SQL Server is horrible at this type of query when the amount of IDs are high, the parser that it uses to simply too slow.
Let me give an example:
SELECT * FROM table WHERE id IN (288525, 288528, 288529,<about 5000 ids>, 403043, 403044) OPTION (RECOMPILE)
Time to execute: ~1100 msec (This returns appx 200 rows in my example)
Versus:
SELECT * FROM table WHERE id BETWEEN 288525 AND 403044 OPTION (RECOMPILE)
Time to execute: ~80 msec (This returns appx 50000 rows in my example)
So even though I get 250 times more data back, it executes 14 times faster...
So I built this function to take my list of ids and build something that will return a reasonable compromise between the two (something that doesn't return 250 times as much data, yet still gives the benefit of parsing the query faster)
private const int MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH = 5;
public static string MassIdSelectionStringBuilder(
List<int> keys, ref int startindex, string colname)
{
const int maxlength = 63000;
if (keys.Count - startindex == 1)
{
string idstring = String.Format("{0} = {1}", colname, keys[startindex]);
startindex++;
return idstring;
}
StringBuilder sb = new StringBuilder(maxlength + 1000);
List<int> individualkeys = new List<int>(256);
int min = keys[startindex++];
int max = min;
sb.Append("(");
const string betweenAnd = "{0} BETWEEN {1} AND {2}\n";
for (; startindex < keys.Count && sb.Length + individualkeys.Count * 8 < maxlength; startindex++)
{
int key = keys[startindex];
if (key > max+MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH)
{
if (min == max)
individualkeys.Add(min);
else
{
if(sb.Length > 2)
sb.Append(" OR ");
sb.AppendFormat(betweenAnd, colname, min, max);
}
min = max = key;
}
else
{
max = key;
}
}
if (min == max)
individualkeys.Add(min);
else
{
if (sb.Length > 2)
sb.Append(" OR ");
sb.AppendFormat(betweenAnd, colname, min, max);
}
if (individualkeys.Count > 0)
{
if (sb.Length > 2)
sb.Append(" OR ");
string[] individualkeysstr = new string[individualkeys.Count];
for (int i = 0; i < individualkeys.Count; i++)
individualkeysstr[i] = individualkeys[i].ToString();
sb.AppendFormat("{0} IN ({1})", colname, String.Join(",",individualkeysstr));
}
sb.Append(")");
return sb.ToString();
}
It is then used like this:
List<int> keys; //Sort and make unique
...
for (int i = 0; i < keys.Count;)
{
string idstring = MassIdSelectionStringBuilder(keys, ref i, "id");
string sqlstring = string.Format("SELECT * FROM table WHERE {0} OPTION (RECOMPILE)", idstring);
However, my question is...
Does anyone know of a better/faster/smarter way to do this?
In my experience the fastest way was to pack numbers in binary format into an image. I was sending up to 100K IDs, which works just fine:
Mimicking a table variable parameter with an image
Yet is was a while ago. The following articles by Erland Sommarskog are up to date:
Arrays and Lists in SQL Server
If the list of Ids were in another table that was indexed, this would execute a whole lot faster using a simple INNER JOIN
if that isn't possible then try creating a TABLE variable like so
DECLARE #tTable TABLE
(
#Id int
)
store the ids in the table variable first, then INNER JOIN to your table xxx, i have had limited success with this method, but its worth the try
You're using (key > max+MAX_NUMBER_OF_EXTRA_OBJECTS_TO_FETCH) as the check to determine whether to do a range fetch instead of an individual fetch. It appears that's not the best way to do that.
let's consider the 4 ID sequences {2, 7}, {2,8}, {1,2,7}, and {1,2,8}.
They translate into
ID BETWEEN 2 AND 7
ID ID in (2, 8)
ID BETWEEN 1 AND 7
ID BETWEEN 1 AND 2 OR ID in (8)
The decision to fetch and filter the IDs 3-6 now depends only on the difference between 2 and 7/8. However, it does not take into account whether 2 is already part of a range or a individual ID.
I think the proper criterium is how many individual IDs you save. Converting two individuals into a range removes has a net benefit of 2 * Cost(Individual) - Cost(range) whereas extending a range has a net benefit of Cost(individual) - Cost(range extension).
Adding recompile not a good idea. Precompiling means sql does not save your query results but it saves the execution plan. Thereby trying to make the query faster. If you add recompile then it will have the overhead of compiling the query always. Try creating a stored procedure and saving the query and calling it from there. As stored procedures are always precompiled.
Another dirty idea similar to Neils,
Have a indexed view which holds the IDs alone based on your business condition
And you can join the view with your actual table and get the desired result.
The efficient way to do this is to:
Create a temporary table to hold the IDs
Call a SQL stored procedure with a string parameter holding all the comma-separated IDs
The SQL stored procedure uses a loop with CHARINDEX() to find each comma, then SUBSTRING to extract the string between two commas and CONVERT to make it an int, and use INSERT INTO #Temporary VALUES ... to insert it into the temporary table
INNER JOIN the temporary table or use it in an IN (SELECT ID from #Temporary) subquery
Every one of these steps is extremely fast because a single string is passed, no compilation is done during the loop, and no substrings are created except the actual id values.
No recompilation is done at all when this is executed as long as the large string is passed as a parameter.
Note that in the loop you must tracking the prior and current comma in two separate values
Off the cuff here - does incorporating a derived table help performance at all? I am not set up to test this fully, just wonder if this would optimize to use between and then filter the unneeded rows out:
Select * from
( SELECT *
FROM dbo.table
WHERE ID between <lowerbound> and <upperbound>) as range
where ID in (
1206,
1207,
1208,
1209,
1210,
1211,
1212,
1213,
1214,
1215,
1216,
1217,
1218,
1219,
1220,
1221,
1222,
1223,
1224,
1225,
1226,
1227,
1228,
<...>,
1230,
1231
)

Categories