Is there a way to allow data to get truncated if its too long when inserting it from one
row to another row? Whats happening now is that it gets an error and doesn't add the row if one of the fields is too long.
Here is the piece of code that I have:
DataRow dr = dt.NewRow();
for (int j = 0; j < edi50.Columns.Count; j++)
dr[j] = dr50[j];
dt.Rows.Add(dr);
try
{
RemoveNulls(dt);
daEDI40050.Update(dt);
}
catch (Exception e)
{
string m = e.Message;
}
I have a description field 25 chars long but the data is 34 chars going into it. I want to be able to have it insert the 25 and truncate the rest and still add the row.
thank you
You could get the schema from database first(untested):
DataTable schema;
using (var con = new System.Data.SqlClient.SqlConnection(conStr))
{
var getSchemaSql = String.Format("SELECT * FROM {0}", tableName);
using (var schemaCommand = new System.Data.SqlClient.SqlCommand(getSchemaSql, con))
{
con.Open();
using (var reader = schemaCommand.ExecuteReader(CommandBehavior.SchemaOnly))
{
schema = reader.GetSchemaTable();
}
}
}
and then something similar to this:
for (int j = 0; j < schema.Rows.Count; j++)
{
DataRow schemaRow = schema.Rows[j];
Type dataType = schemaRow.Field<Type>("DataType");
int columnSize = schemaRow.Field<int>("ColumnSize");
if (dataType.FullName == "System.String")
{
String value = dr50[j] as String;
if (value != null)
value = value.Substring(0, columnSize);
}
}
Again, totally untested and written from scratch, but it might give you an idea how to get the column size. Of course this works only for string, but i assume that this is what you want.
If you know which column has the limit, and which field that column maps to, then simply truncate the field's value on all objects before calling Update():
myObject.StringField = myObject.StringField.Substring(0,25);
Related
I have a DataTable with 1 column and a List of DataTables with 2 columns each.
I want to compare the Value of the DataTable with the first 6 digits of the Value of each DataTable in the List row by row.
This is my Code:
for(int fs = 0; fs < dataTable.Rows.Count; fs++)
{
for(int fs2 = 0; fs < dataTableList.Count; fs2++)
{
for(int fs3 = 0; fs3 < dataTableList[fs2].Rows.Count; fs3++)
{
if(dataTable.Rows[fs]["columnName"].ToString().Equals(dataTableList[fs2].Rows[fs3]["otherColumnName"].ToString().Substring(0,6)))
{
//do sth.
}
}
}
}
When the program reaches if(dataTable.Rows[fs]["columnName"].ToString().Equals(dataTableList[fs2].Rows[fs3]["otherColumnName"].ToString().Substring(0,6))) it stops and i get an System.ArgumentOutOfRangeException error.
Does anybody know what I am doing wrong? When I MessageBox the substring it is working.
So, first of all, I suggest refactoring this monstrosity to foreach loops.
foreach(var row in dataTable.Rows)
{
foreach(var otherDataTable in dataTableList)
{
foreach(var otherRow in otherDataTable.Rows)
{
/* ... */
}
}
}
And then checking if the string you're trying to get a substring of actually has length of 6 or more.
const int compareLength = 6;
const string columnName = "columnName";
const string otherColumnName = "otherColumnName";
foreach(var row in dataTable.Rows)
{
foreach(var otherDataTable in dataTableList)
{
foreach(var otherRow in otherDataTable.Rows)
{
var value = row[columnName].ToString();
var otherValue = otherRow[otherColumnName].ToString();
if(otherValue.Length >= compareLength &&
value == otherValue.Substring(0, compareLength))
{
/* Do something. */
}
}
}
}
My bet is that Substring call when the compared value was shorter than 6 was the problem. See if this helps.
I'm working in closed envrionment where I cannot install additional packages and have limited ability to use .Net framework classes. Plus I have no control over the CSV file format that I'm receiving.
I receive a CSV file that must be pulled into our business system and updates the database.
I can pull the file in to a DataTable via the below code ...
CSV File Ex:
Order# Qty Description ...
12345 3 desc1, desc2, desc3, etc..
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < rows.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
However, the problem is that one field in the CSV file is a description that contains multiple "," characters. Doing the above loads each comma separated word set in the description value into its own index in the rows array.
Currently there should be a total of 10 columns in the csv file but with the description field issue the number of columns vary depending on the length/number of commas in the description field...10, 15, 22 columns etc.
I have no control over the format of the CSV file before it's sent. Is there any way to get around this. Even skipping over this field when creating the DataTable would be fine for my purposes.
Thanks
You can use textqualifier to enclose every field so that the commas or semicolons are not considered as delimeters. The following method should fix the problem.
Install-Package CsvHelper
public static DataTable ReadCSVToDataTable(string path)
{
CsvHelper.Configuration.CsvConfiguration config = new CsvHelper.Configuration.CsvConfiguration();
config.Delimiter = delimeter;
config.Encoding = new UTF8Encoding(false);
if (string.IsNullOrEmpty(textQualifier))
{
config.QuoteAllFields = false;
}
else
{
char qualifier = textQualifier.ToCharArray()[0];
config.Quote = qualifier;
config.QuoteAllFields = true;
}
DataTable dt = new DataTable();
using (var sr = new StreamReader(path))
{
using (var reader = new CsvReader(sr, config))
{
int j = 0;
while (reader.Read())
{
if (j == 0)
{
if (config.HasHeaderRecord)
{
foreach (string header in reader.FieldHeaders)
dt.Columns.Add(header);
}
else
{
for (int i = 0; i < reader.CurrentRecord.Length; i++)
dt.Columns.Add();
}
j++;
}
AddRow(dt, reader);
}
}
}
return dt;
}
Fstagger, this should work for you assuming you have only one column with internal comma's and the CSV is formed properly (especially if the Description field begins with ," and ends with ",. You need to replace my example INDEX_OF_DESCRIPTION with the actual value.
int iDescStart = 0;
int iDescEnd = 0;
string zLine = "";
const int INDEX_OF_DESCRIPTION = 3;
const char SEPARATOR = '\u001F'; //ASCII Unit Separator, decimal 31
while(!sr.EndOfStream){
zLine = sr.ReadLine();
iDescStart = zLine.IndexOf(",\"");
iDescEnd = zLine.IndexOf("\",");
zLine = zLine.Substring(0, iDescStart)
+ ","
+ zLine.Substring(iDescStart + 2, iDescEnd - iDescStart - 2).Replace(',', SEPARATOR)
+ ","
+ zLine.Substring(iDescEnd + 2);
string[] zaFields = zLine.Split(',');
zaFields[INDEX_OF_DESCRIPTION] = zaFields[INDEX_OF_DESCRIPTION].Replace(SEPARATOR, ',');
datarow dr = dt.NewRow();
for (int i = 0; i < zaFields.Length; i++){
dr[i] = zaFields[i];
}
dt.Rows.Add(dr);
}
Let me know if this works for you : )
It looks like your CSV has fixed size columns padded with spaces. So I guess you'd be better off reading a fixed amount of characters for each column and trim the trailing spaces, instead of splitting with comma.
Try this class.
It deals with commas how you need.
My Solution that ended up working
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(',');
var fullrow = String.Empty;
foreach (var entry in rows)
{
fullrow += entry.ToString() + ",";
}
var startQuote = fullrow.IndexOf("\"");
var endQuote = fullrow.IndexOf("\"", startQuote + 1); //LastIndexOf("\"");
if (startQuote > -1 && endQuote > -1)
{
var substring = fullrow.Substring(startQuote, Math.Abs(startQuote - endQuote));
substring = substring.Replace(',', ' ');
fullrow = fullrow.Remove(startQuote, Math.Abs(startQuote - endQuote)).Insert(startQuote, substring);
}
rows = fullrow.Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < rows.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
Thanks #Michael Gorsich for the alternate code!
I am trying to compute the sum for same IDs in each column in the data table. In the datatable there are empty elements. When I run the following code to the line calculating colP_sum, it gives me the error if "specific cast is not valid". It seems caused by the empty elements in the data table? How should I solve it? I am sure if the datatable is filled up with numbers this code works.
for (int i = 0; i < LoadIDcount; i++)
{
string IDnow = LoadID[i, 0];
string IDsaved = LoadP_dt.Rows[i][0].ToString();
if (LoadP_dt.Rows[i][0].ToString() == IDnow)
{
for (int j = 0; j < 8760; j++)
{
string colPnow = SP_dt.Columns[j * 2 + 4].ColumnName.ToString();
double ColP_sum = (double)SP_dt.Compute(String.Format("Sum([{0}])", colPnow), String.Format("Load_ID = '{0}'", IDnow));
string colQnow = SP_dt.Columns[j * 2 + 5].ColumnName.ToString();
double ColQ_sum = (double)SP_dt.Compute(String.Format("Sum([{0}])", colQnow), String.Format("Load_ID = '{0}'", IDnow));
LoadP_dt.Rows[i][j + 2] = ColP_sum;
LoadQ_dt.Rows[i][j + 2] = ColQ_sum;
Console.WriteLine("{0} {1}", i, j);
}
}
else
{
Console.WriteLine("ID does not match");
}
}
CSVfilewriter(CSVPpath, LoadP_dt);//save the Load_P datatable to CSV file
CSVfilewriter(CSVQpath, LoadQ_dt);//save the Load_Q datatable to CSV file
//CSVfilewriter(CSVSPpath, SP_dt);//save the service point datatable to CSV file
}
if "colPnow" is not a number, that could explain it: the "Compute" and "Sum" both appear to be expecting number value
I write some data into csv file from List but some list indexes has empty string but another indexes has value
in these cases the data compared with another list wrote in the same csv file
this is my csv file opened using excel sheet
in the third column there exist ID for the the second column cell so in the coming rows i want to detect the name of the ID based on previous rows
like in row 3 it's ID is 19 and name is I/O so in the 7th row the ID is 19 and want to fill the second cell now
info : the IDs is already known above and any next ID will be exist before
by the follwing code.
bool isInList = ms.IndexOf(ShapeMaster) != -1;
if (isInList)
{
savinglabelnamefortextbox = t.InnerText;
string replacement =
Regex.Replace(savinglabelnamefortextbox, #"\t|\n|,|\r", "");
xl.Add("");
dl.Add(replacement);
ms.Add(ShapeMaster);
}
and I use the following code to write to the csv file
using (StreamWriter sw = File.CreateText(csvfilename))
{
for (int i = 0; i < dl.Count; i++)
{
var line = String.Format("{0},{1},{2}", dl[i], xl[i],ms[i]);
sw.WriteLine(line);
}
}
Try this
for (int x = 0; x < ms.Count; x++)
{
if (xl[x] != "")
{
continue;
}
else if (xl[x] == "")
{
for (int y = 0; y<xl.Count; y++)
{
if (ms[y] == ms[x])
{
xl[x] = xl[y];
break;
}
}
continue;
}
}
I have the following code written in C# but according to that, it would take me 4-5 days to migrate the data from Oracle database to Elasticsearch. I am inserting the records in batches of 100. Is there any other way that the migration of the 4 million records takes place faster (probably in less than a day, if possible)?
public static void Selection()
{
for(int i = 1; i < 4000000; i += 1000)
{
for(int j = i; j < (i+1000); j += 100)
{
OracleCommand cmd = new OracleCommand(BuildQuery(j),
oracle_connection);
OracleDataReader reader = cmd.ExecuteReader();
List<Record> list=CreateRecordList(reader);
insert(list);
}
}
}
private static List<Record> CreateRecordList(OracleDataReader reader)
{
List<Record> l = new List<Record>();
string[] str = new string[7];
try
{
while (reader.Read())
{
for (int i = 0; i < 7; i++)
{
str[i] = reader[i].ToString();
}
Record r = new Record(str[0], str[1], str[2], str[3],
str[4], str[5], str[6]);
l.Add(r);
}
}
catch (Exception er)
{
string msg = er.Message;
}
return l;
}
private static string BuildQuery(int from)
{
int to = from + change - 1;
StringBuilder builder = new StringBuilder();
builder.AppendLine(#"select * from");
builder.AppendLine("(");
builder.AppendLine("select FIELD_1, FIELD_2,
FIELD_3, FIELD_4, FIELD_5, FIELD_6,
FIELD_7, ");
builder.Append(" row_number() over(order by FIELD_1)
rn");
builder.AppendLine(" from tablename");
builder.AppendLine(")");
builder.AppendLine(string.Format("where rn between {0} and {1}",
from, to));
builder.AppendLine("order by rn");
return builder.ToString();
}
public static void insert(List<Record> l)
{
try
{
foreach(Record r in l)
client.Index<Record>(r, "index", "type");
}
catch (Exception er)
{
string msg = er.Message;
}
}
The ROW_NUMBER() function is going to negatively impact performance, and you're running it thousands of times. You're already using an OracleDataReader -- it will not pull all four million rows to your machine at once, it's basically streaming them one or a few at a time.
This has to be doable in minutes or hours, not days -- we have several processes that move millions of records between a Sybase and SQL server in a similar manner and it takes less than five minutes.
Maybe give this a shot:
OracleCommand cmd = new OracleCommand("SELECT ... FROM TableName", oracle_connection);
int batchSize = 500;
using (OracleDataReader reader = cmd.ExecuteReader())
{
List<Record> l = new List<Record>(batchSize);
string[] str = new string[7];
int currentRow = 0;
while (reader.Read())
{
for (int i = 0; i < 7; i++)
{
str[i] = reader[i].ToString();
}
l.Add(new Record(str[0], str[1], str[2], str[3], str[4], str[5], str[6]));
// Commit every time batchSize records have been read
if (++currentRow == batchSize)
{
Commit(l);
l.Clear();
currentRow = 0;
}
}
// commit remaining records
Commit(l);
}
Here's what Commit might look like:
public void Commit(IEnumerable<Record> records)
{
// TODO: Use ES's BULK features, I don't know the exact syntax
client.IndexMany<Record>(records, "index", "type");
// client.Bulk(b => b.IndexMany(records))... something like this
}
But you are not inserting in batches of 100
In the end you are inserting one at a time
(and that may not even be the correct code to insert one)
foreach(Record r in l)
client.Index<Record>(r, "index", "type");
All those girations on read do nothing if the insert is one row at a time
You are just introducing lag while you get the the next batch
Read is (almost) always faster than write
OracleCommand cmd = new OracleCommand(BuildQuery(all), oracle_connection);
OracleDataReader reader = cmd.ExecuteReader();
while (reader.Read())
{
client.Index<Record>(new Record(reader.GetSting(0),
reader.GetSting(1), reader.GetSting(2), reader.GetSting(3),
reader.GetSting(4), reader.GetSting(5), reader.GetSting(6),
"index", "type");
}
reader.Close();
You could use a BlockingCollection if you want to read and write in parallel
But use a max size to read does not get too far ahead of write