CSV reading data with comma's in data - c#

I've got a project that has to do a bulk import to SQL from a CSV file. Creating the data columns has been a success, however, I'm running into a problem with the rows. A comma is used as the delimiter to separate the columns which work great in the column names, but not in the rows of data. Some data has a comma to split name and surnames. Together with this, only every second field(column) is enclosed in double quotes. Using all this is breaking the rows in many more columns than it should. I have suggested changing the delimiter to a semicolon which actually works great and everything works fine, except this is not accepted by the customer as they don't want to change anything.
This is what I've done:
private static DataTable ImportFordEmailList(string csvFilePath)
{
DataTable csvData = new DataTable();
DataTable dt = new DataTable();
dt.Columns.Add("ColumnName");
dt.Rows.Clear();
try
{
using (TextFieldParser csvReader = new TextFieldParser(csvFilePath))
{
// csvReader.TextFieldType = FieldType.Delimited;
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = false;
csvReader.TrimWhiteSpace = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
if (dt.Rows.Count > 0)
{
string newColumn = Regex.Replace(column, "[^A-Za-z0-9]", "");
string findColum = "ColumnName = '" + newColumn.Trim() + "'";
DataRow[] foundRows = dt.Select(findColum);
if (foundRows.Length == 0)
{
DataRow dr = dt.NewRow();
dr["ColumnName"] = newColumn.Trim();
dt.Rows.Add(dr);
}
else
{
DataRow dr = dt.NewRow();
dr["ColumnName"] = newColumn.Trim() + "1";
dt.Rows.Add(dr);
}
}
else
{
string newColumn = column.Replace("'", "");
newColumn = newColumn.Replace(" ", "");
string clean = Regex.Replace(newColumn, "[^A-Za-z0-9 ]", "");
DataRow dr = dt.NewRow();
dr["ColumnName"] = clean.Trim();
dt.Rows.Add(dr);
}
}
foreach (DataRow row in dt.Rows)
{
string colName = Regex.Replace(row["ColumnName"].ToString().Trim(), "/^[ A-Za-z0-9]*$/", "");
DataColumn datecolumn = new DataColumn(colName);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
foreach (string s in fieldData)
{
s.Replace("\"","");
Regex.Replace(s, "/^[ A-Za-z0-9 '#.()]", "");
string a = s;
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
This is an example of how the data looks like:
Is there a way that I can work around this and make this work?
----- EDIT, Add data sample as text --------
Name,Name,Email,Manager Level1,Level 1 manager's email,Manager Level2,Level 2 manager's email
Adams, D. (Deon) ,"Adams, Deon. (D) ",username#email.com,"Masete, Thabo (B.T.)",username#email.com,"Fraser, Mervyn (M.)",username#email.com
Akaramunkongwanit, S. (Sirapra) ,"Akaramunkongwanit, Sirapra (S.)",username#email.com> ,"Naraphirom, Suphajitphat (Pin.)",username#email.com,"Jeeradeepalung, Jirawat (Jee.)",username#email.com
Angel, L. (Dave) ,"Angel, Dave (L.) ",username#email.com,"Causton, Keith (K.H.) ",username#email.com,"White, Chris- Manf Eng (C.F.) ",username#email.com
Apairat, J. (Janjira),"Apairat, Janjira (J.) "username#email.com,"Choksiriwanna, Phatthar (Patsy.)",username#email.com,"Phusitpoykai, Rachawan (R.) ",username#email.com

Related

Import Text File into SQL Server Database using C#

I am trying to import text file into sql server database and the import is working fine but the problem is that all the columns in the text file is being inserted into one column.
I need the columns from the text file to map the columns in the sql table.
here is my code
Console.WriteLine(s);
string fileName = s.ToString();
string fullPath = path + fileName.ToString();
DataTable dt = new DataTable();
dt.Columns.AddRange(new DataColumn[3] { new DataColumn("Environment", typeof(string)),
new DataColumn("Job_Name", typeof(string)),
new DataColumn("Occurs",typeof(string)) });
string csvData = File.ReadAllText(fullPath);
foreach (string row in csvData.Split('\n'))
{
if (!string.IsNullOrEmpty(row))
{
dt.Rows.Add();
int i = 0;
foreach (string cell in row.Split(','))
{
dt.Rows[dt.Rows.Count - 1][i] = cell;
i++;
}
}
}
string consString = ConfigurationManager.ConnectionStrings["myConn"].ConnectionString;
using (SqlConnection con = new SqlConnection(consString))
{
using (SqlBulkCopy sqlBulkCopy = new SqlBulkCopy(con))
{
//Set the database table name
sqlBulkCopy.DestinationTableName = "[dbo].[test2]";
con.Open();
sqlBulkCopy.WriteToServer(dt);
con.Close();
}
}
You are splitting your rows on comma when your data is tab separated. Instead do this:
row.Split('\t')
Also, don't split your entire file on \n, use File.ReadAllLines, for example:
foreach (string row in File.ReadAllLines(fullPath))
{
if (!string.IsNullOrEmpty(row))
{
dt.Rows.Add();
int i = 0;
foreach (string cell in row.Split('\t'))
{
dt.Rows[dt.Rows.Count - 1][i] = cell;
i++;
}
}
}
Below c# function import comma delimited file into C# dataTable. After getting data into dataTable you can apply your desired methods(Bulk Insert / Row by row) to take in DB :
public static DataTable ImportDataFromCSVFile(string filePath)
{
DataTable dataTable = new DataTable();
try
{
using (StreamReader readFile = new StreamReader(filePath))
{
string line;
StringBuilder sb = new StringBuilder();
string[] row;
int counter = 0;
int length = 0;
while ((line = readFile.ReadLine()) != null)
{
row = line.Split(',');
if (counter == 0)
{
length = row.Length;
DataRow dr1 = dataTable.NewRow();
for (int i = 0; i < length; i++)
{
try
{
//dataTable.Columns.Add("Col_" + i.ToString());
dataTable.Columns.Add(Convert.ToString(row[i]));
}
catch (Exception ex)
{
}
}
// dataTable.Rows.Add(dr1);
}
else
{
if (row.Length == dataTable.Columns.Count)
{
DataRow dr = dataTable.NewRow();
for (int i = 0; i < length; i++)
{
if (row[i].ToString().Contains('"'))
{
row[i] = row[i].Replace('"', ' ');
}
dr[i] = Convert.ToString(row[i]);
}
dataTable.Rows.Add(dr);
}
else
{
}
}
counter++;
}
}
}
catch (Exception ex)
{
}
return dataTable;
}

Use Button to Go to Next Available ID in Dataview C#

I have the data below, and I used RowFilter to filter the ID, and only show rows that have ID = 111. However, I need to create a "Next" button to be able to go to the next unique ID which is 222. The IDs are not incremental.
Original Table:
Any tips on how to approach this? I am running out of options
string[] columnnames = file.ReadLine().Split('|');
DataTable dt = new DataTable();
foreach (string c in columnnames)
{
dt.Columns.Add(c);
}
string newline;
while ((newline = file.ReadLine()) != null)
{
DataRow dr = dt.NewRow();
string[] values = newline.Split('|');
for (int i = 0; i < values.Length; i++)
{
dr[i] = values[i];
}
dt.Rows.Add(dr);
}
DataView dv = new DataView(dt);
dv.RowFilter = "ID = '111'";
dataGridView1.DataSource = dv;
I used dv.RowFilter = "ID = '111'"; however how can I make it dynamic so it can go to the next ID= 222?
Thanks

excel duplicacy check while upload to sql

i have 2 tables..
i want to check the excel sheet value which is in a dataset with database value:
how to check??
here is the code:
lblmsg.Text = "";
try
{
//System.Threading.Thread.Sleep(5000);
int stateid = 0, cityid = 0;
DataTable dtbank = new DataTable();
DataSet ds = new DataSet();
if (fildetails.HasFile)
{
string fileExtension = System.IO.Path.GetExtension(fildetails.FileName);
if (fileExtension == ".xls" || fileExtension == ".xlsx")
{
string fileLocation = Server.MapPath("/NewFolder1/") + fildetails.FileName;
if (System.IO.File.Exists(fileLocation))
{
// System.IO.File.Delete(fileLocation);
}
fildetails.SaveAs(fileLocation);
string excelConnectionString = string.Empty;
excelConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" +
fileLocation + ";Extended Properties=\"Excel 12.0;HDR=Yes;IMEX=2\"";
//connection String for xls file format.
if (fileExtension == ".xls")
{
excelConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
fileLocation + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=2\"";
}
//connection String for xlsx file format.
else if (fileExtension == ".xlsx")
{
excelConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" +
fileLocation + ";Extended Properties=\"Excel 12.0;HDR=Yes;IMEX=2\"";
}
//Create Connection to Excel work book and add oledb namespace
OleDbConnection excelConnection = new OleDbConnection(excelConnectionString);
excelConnection.Open();
DataTable dt = new DataTable();
dt = excelConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dt == null)
{
return;
}
String[] excelSheets = new String[dt.Rows.Count];
int t = 0;
//excel data saves in temp file here.
foreach (DataRow row in dt.Rows)
{
string x = row["TABLE_NAME"].ToString();
if (x != "Sheet1$_" && x != "Sheet2$_" && x != "Sheet3$_" && x != "Sheet4$_" && x != "Sheet5$_")
{
excelSheets[t] = row["TABLE_NAME"].ToString();
t++;
}
}
OleDbConnection excelConnection1 = new OleDbConnection(excelConnectionString);
int totalsheet = excelSheets.Length;
for (int i = 0; i < totalsheet; i++)
{
string query = string.Format("Select * from [{0}]", excelSheets[i]);
using (OleDbDataAdapter dataAdapter = new OleDbDataAdapter(query, excelConnection1))
{
dataAdapter.Fill(ds);
}
}
}
if (fileExtension.ToString().ToLower().Equals(".xml"))
{
string fileLocation = Server.MapPath("~/Content/") + Request.Files["FileUpload"].FileName;
if (System.IO.File.Exists(fileLocation))
{
System.IO.File.Delete(fileLocation);
}
Request.Files["FileUpload"].SaveAs(fileLocation);
XmlTextReader xmlreader = new XmlTextReader(fileLocation);
// DataSet ds = new DataSet();
ds.ReadXml(xmlreader);
xmlreader.Close();
}
here i am getting the excel value in ds..what should i do??
how to check??
i have tried this thing:
BL objbankbl=new BL();
for (int j = 0; j < ds.Tables.Count; j++)
{
for (int i = 0; i < ds.Tables[j].Rows.Count; i++)
{
////city_name///
if (!DBNull.Value.Equals(ds.Tables[j].Rows[i][0]))
{
// dtbank = objbankbl.GetReportDate("","","", ds.Tables[j].Rows[i][0].ToString(), "", "", "","","");
dtbank = objbankbl.GetReportDate(ds.Tables[j].Rows[i][0].ToString());
if (dtbank.Rows.Count > 0 && ( ds.Tables[j].Rows[i][0].ToString() == dtbank.Rows[j]["Name"]) )
{
stateid = Convert.ToInt32(dtbank.Rows[0]["ID"]);
}
else
{
string bankname = ds.Tables[j].Rows[i][0].ToString();
if (bankname != " " || bankname != null)
{
//stateid = objbankbl.Insert(1, ds.Tables[j].Rows[i][0].ToString(), "", "", 0, "", 0);
}
}
}
DataTable dt = new DataTable();
dt.Columns.Add();
dt.Columns.Add();
dt.Columns.Add();
dt.Rows.Add(1, "Test1", "Sample1");
dt.Rows.Add(2, "Test2", "Sample2");
dt.Rows.Add(3, "Test3", "Sample3");
dt.Rows.Add(4, "Test4", "Sample4");
dt.Rows.Add(5, "Test5", "Sample5");
var duplicates = dt.AsEnumerable().GroupBy(r => r[0]).Where(gr => gr.Count() > 1).ToList();
Console.WriteLine("Duplicate found: {0}", duplicates.Any());
dt.Rows.Add(1, "Test6", "Sample6"); // Duplicate on 1
dt.Rows.Add(1, "Test6", "Sample6"); // Duplicate on 1
dt.Rows.Add(3, "Test6", "Sample6"); // Duplicate on 3
dt.Rows.Add(5, "Test6", "Sample6"); // Duplicate on 5
duplicates = dt.AsEnumerable().GroupBy(r => r[0]).Where(gr => gr.Count() > 1).ToList();
if (duplicates.Any())
Console.WriteLine("Duplicate found for Classes: {0}", String.Join(", ", duplicates.Select(dupl => dupl.Key)));
Console.ReadLine();
i hope this example help u.
it can be handled easily by using DataView.ToTable Method. The syntax is below.
DataView.ToTable(bool distinct, string[] columnNames)
distinct: If it's true, the returned DataTable contains rows that have distinct values for all its columns specified in the second parameter. Default value is false.
columnNames: A string array that contains a list of the column names to be included in the returned table. The order of columns in returned table would be same as it’s appear in the array.
Ex1
DataTable temp = dt.DefaultView.ToTable(true, "Region");
Ex2
DataTable temp = dt.DefaultView.ToTable(true, "Region", "City");
There are several ways to make it work, the first two that come to my mind are either the use of HashTables or LinQ expressions.
Take a look to this: Best way to remove duplicate entries from a data table but instead of removing the duplicate (look at the second foreach) you print the message.
public void CheckDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Checks the list dimension to verify if there is any duplicate
if(duplicateList.Count() > 0)
{
//you can print your message here or eventually get info about the duplicate row
}
}

Fill a Data table from a text file

I have this block of code which reads a text file [Tab delimited] and return a data table. But the problem is that it treat the first line of the file or record as the header and display the remaining lines as records which subtract the number of record by - 1
so now i want the code to read all the content of the file as records.
here is the code:
streamReader reader = new streamReader (filePath);
string line = reader.readLine();
Datatable dt = new Datatable ();
DataRow row;
string[] value = line.Split('\t');
foreach(string dc in value)
{
dt.columns.add(New DataColumn(dc));
}
while(!reader.endofStream)
{
value = reader.ReadLine().split('\t');
if (value.Length == dt.Columns.Count)
{
row = dt.NewRow();
row.ItemArray = value;
dt.Rows.Add(row);
}
}
return dt;
When i try to remove
foreach(string dc in value)
{
dt.columns.add(New DataColumn(dc));
}
and all lines code that depend on it , the dt return nothing.
how can i solve it ?
If you know that the there's no header, i assume that you don't know the name of the columns neither, do you? Then you need to add "anonymous" columns instead:
DataTable dt = new DataTable();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] fields = line.Split('\t');
if (dt.Columns.Count == 0)
{
foreach (string field in fields)
{
// will add default names like "Column1", "Column2", and so on
dt.Columns.Add();
}
}
dt.Rows.Add(fields);
}
DataTable dt = new DataTable();
var lines = File.ReadAllLines(strPath).ToList();
dt.Columns.AddRange(lines.First().Split(new char[] { '\t' }).Select(col => new DataColumn(col)).ToArray());
lines.RemoveAt(0);
lines.Select(x => x.Split(new char[] { '\t' })).ToList().ForEach(row => dt.Rows.Add(row));
Use this for test data
List<string> lines = new List<string>();
lines.Add("Col1\tCol2\tCol3");
lines.Add("aaa\tbbb\tccc");

Merge datatables but ignore duplicated rows

I have the following code, its a custom people picker for sharepoint 2010.
It searches by username, but also by the person name.
Because its a contains search, if I try with part of my username: cia
It shows my duplicated rows because that matches the username but also the person name.
this is my code (I cant use LINQ:
protected override int IssueQuery(string search, string groupName, int pageIndex, int pageSize)
{
try
{
// Find any user that has a matching name
var table = ADHelper.ExecuteNameQuery(RootPath, search);
// 20249: Search by username, method was already done, but it was not being called.
var table2 = ADHelper.ExecutesAMAccountNameQuery(search);
table2.Merge(table,);
PickerDialog.Results = table2;
Normally the DataTable.Merge method removes duplicates implicitely. But only when all columns' values are the same.
I'm not sure if there is something simplier(you've mentioned that you cannot use LINQ), but you could merge both and remove the duplicates afterwards:
List<string> dupColumns = new List<string>();
dupColumns.Add("ColumnA");
dupColumns.Add("ColumnB");
table2.Merge(table,);
RemoveDuplicates(table2, dupColumns);
And here the remove-duplicates function:
private void RemoveDuplicates(DataTable table, List<string> keyColumns)
{
Dictionary<string, string> uniquenessDict = new Dictionary<string, string>(table.Rows.Count);
System.Text.StringBuilder sb = null;
int rowIndex = 0;
DataRow row;
DataRowCollection rows = table.Rows;
while (rowIndex < rows.Count)
{
row = rows[rowIndex];
sb = new System.Text.StringBuilder();
foreach (string colname in keyColumns)
{
sb.Append(((string)row[colname]));
}
if (uniquenessDict.ContainsKey(sb.ToString()))
{
rows.Remove(row);
}
else
{
uniquenessDict.Add(sb.ToString(), string.Empty);
rowIndex++;
}
}
}
you should the .ToTable function
here is a sample code
DataTable DT1 = new DataTable();
DT1.Columns.Add("c_" + DT1.Columns.Count);
DT1.Columns.Add("c_" + DT1.Columns.Count);
DT1.Columns.Add("c_" + DT1.Columns.Count);
DataRow DR = DT1.NewRow();
DR[0] = 0;
DR[1] = 1;
DR[2] = 2;
DT1.Rows.Add(DR);
DataTable DT2 = new DataTable();
DT2.Columns.Add("c_" + DT2.Columns.Count);
DT2.Columns.Add("c_" + DT2.Columns.Count);
DT2.Columns.Add("c_" + DT2.Columns.Count);
DT2.Columns.Add("c_" + DT2.Columns.Count);
DR = DT2.NewRow();
DR[0] = 0;
DR[1] = 1;
DR[2] = 2;
DR[3] = 3;
DT2.Rows.Add(DR);
DT1.Merge(DT2);
Trace.IsEnabled = true;
DataTable DT_3=DT1.DefaultView.ToTable(true,new string[]{"c_1","c_2","c_0"});
foreach (DataRow CDR in DT_3.Rows)
{
Trace.Warn("val",CDR[1]+"");//you will find only one data row
}

Categories