I need import Excel file to sql server by SSIS. there is a situation. two columns in excel associate with each other.
I can do find/replace in excel to replace the line break to some delimiter(say &&). Is there any way we can do the replace by c# code?
Thanks
I made the following spreadsheet to match your data and added an ID column:
Now back to SSIS, I added the following data flow:
Open up Script component and go to inputs and outputs and define your columns:
Go back to Script and click edit.
Paste in the follow code to read the spreadsheet and parse into your outputs:
public override void CreateNewOutputRows()
{
string fileName = #"D:\imports\survey.xlsx";
string SheetName = "Bananas";
string cstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1\"";
using (System.Data.OleDb.OleDbConnection xlConn = new System.Data.OleDb.OleDbConnection(cstr))
{
xlConn.Open();
System.Data.OleDb.OleDbCommand xlCmd = xlConn.CreateCommand();
xlCmd.CommandText = string.Format("Select * from [{0}$]", SheetName);
xlCmd.CommandType = CommandType.Text;
using (System.Data.OleDb.OleDbDataReader rdr = xlCmd.ExecuteReader())
while(rdr.Read())
{
int id = Convert.ToInt32((decimal.Parse(rdr[0].ToString())));
string[] keys = rdr.GetString(1).Split('\n');
string[] values = rdr.GetString(2).Split('\n');
if (keys.Length > 0)
{
for(int i = 0;i<keys.Length;i++)
{
Output0Buffer.AddRow();
Output0Buffer.ID = id;
Output0Buffer.key = keys[i];
Output0Buffer.Pair = values[i];
}
}
}
}
}
Finally, the output:
Related
I need to check Excel files for a sheet containing a specific header in C#.
The sheet names, order and quantity are variable. I need to check all the sheets for the one containing a specific header and then store the name in a variable for later processing.
I can currently get all the sheet names but I am not able to check if it contains what I am looking for or not.
The goal is to get the sheet name and insert it in a SQL statement to process the file in a SSIS package.
This is the code I am currently using:
public void Main()
{
string excelFile;
string connectionString;
OleDbConnection excelConnection;
DataTable tablesInFile;
string[] excelTables = new string[5];
excelFile = Dts.Variables["User::var_MonitoringFile"].Value.ToString();
connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;" +
"Data Source= "+ excelFile +
";Extended Properties=\"EXCEL 8.0;HDR=YES; IMEX=1;\"";
excelConnection = new OleDbConnection(connectionString);
excelConnection.Open();
tablesInFile = excelConnection.GetSchema("Tables");
DataRow tableInFile = tablesInFile.Rows[0];
Dts.Variables["User::var_ExcelSheet"].Value = tableInFile["TABLE_NAME"].ToString();
excelConnection.Close();
Dts.TaskResult = (int)ScriptResults.Success;
}
I want to programmatically sort an excel worksheet using C# but the code I used doesn't work:
//the largest size of sheet in Excel 2010
int maxRowAmount = 1048576;
int maxColAmount = 16384;
//Sort by the value in column G1
sourceWorkSheet.Sort.SortFields.Add(sourceWorkSheet.Range["J:J"], XlSortOn.xlSortOnValues, XlSortOrder.xlAscending, XlSortDataOption.xlSortNormal);
//Find out the last used row and column, then set the range to sort,
//the range is from cell[2,1](top left) to the bottom right corner
int lastUsedRow=sourceWorkSheet.Cells[maxRowAmount, 1].End[XlDirection.xlUp].Row;
int lastUsedColumn=sourceWorkSheet.Cells[2, maxColAmount].End[XlDirection.xlToLeft].Column;
Range r = sourceWorkSheet.Range[sourceWorkSheet.Cells[2, 1], sourceWorkSheet.Cells[lastUsedRow,lastUsedColumn ]];
sourceWorkSheet.Sort.SetRange(r);
//Sort!
sourceWorkSheet.Sort.Apply();
I debug it using the messagebox to print of the value in the column "J" and the result is not sorted:
//print out the sorted result
Range firstColumn = sourceWorkSheet.UsedRange.Columns[10];
System.Array myvalues = (System.Array)firstColumn.Cells.Value;
string[] cmItem = myvalues.OfType<object>().Select(o => o.ToString()).ToArray();
String msg="";
for (int i = 0; i < 30; i++)
{
msg = msg + cmItem[i] + "\n";
}
MessageBox.Show(msg);
What's the reason of it not working?
Thanks
The solution is to put a
sourceWorkSheet.Sort.SortFields.Clear();
before
sourceWorkSheet.Sort.SortFields.Add(sourceWorkSheet.Range["J:J"], XlSortOn.xlSortOnValues, XlSortOrder.xlAscending, XlSortDataOption.xlSortNormal);
In your code you open excel then read from it so sheets are read in original order (not sorted alphabetical).
You can use next code to get sorted sheets.
OleDbConnection connection = new OleDbConnection(string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0}; Extended Properties=\"Excel 8.0;HDR=No;\"", filePath));
OleDbCommand command = new OleDbCommand();
DataTable tableOfData = null;
command.Connection = connection;
try
{
connection.Open();
tableOfData = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string tablename = tableOfData.Rows[0]["TABLE_NAME"].ToString();
tableOfData = new DataTable();
command.CommandText = "Select * FROM [" + tablename + "]";
tableOfData.Load(command.ExecuteReader());
}
catch (Exception ex)
{
}
Here's my situation. I'm designing a program that takes Excel files (which may be in csv, xls, or xlsx format) from a remote network drive, processes the data, then outputs and stores the results of that process. The program provides a listbox of filenames that are obtained from the remote network drive folder using the method detailed in the accepted answer here. Once the user selects a filename from the listbox, I want the program to find the file and obtain the information from it to do the data processing. I have tried using this method to read the data from the Excel file while in a threaded security context, but that method just fails without giving any kind of error. It seems to not terminate. Am I going about this the wrong way?
Edit - (Final Notes: I have taken out the OleDbDataAdapter and replaced it with EPPlus handling.)
I was able to scrub sensitive data from the code, so here it is:
protected void GetFile(object principalObj)
{
if (principalObj == null)
{
throw new ArgumentNullException("principalObj");
}
IPrincipal principal = (IPrincipal)principalObj;
Thread.CurrentPrincipal = principal;
WindowsIdentity identity = principal.Identity as WindowsIdentity;
WindowsImpersonationContext impersonationContext = null;
if (identity != null)
{
impersonationContext = identity.Impersonate();
}
try
{
string fileName = string.Format("{0}\\" + Files.SelectedValue, #"RemoteDirectoryHere");
string connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.14.0; data source={0}; Extended Properties=Excel 14.0;", fileName);
OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM Sheet1", connectionString);
DataSet ds = new DataSet();
adapter.Fill(ds, "Sheet1");
dataTable = ds.Tables["Sheet1"];
}
finally
{
if (impersonationContext != null)
{
impersonationContext.Undo();
}
}
}
Additional Edit
Now xlsx files have been added to the mix.
Third Party
Third party solutions are not acceptable in this case (unless they allow unrestricted commercial use).
Attempts - (Final Notes: Ultimately I had to abandon OleDb connections.)
I have tried all of the different connection strings offered, and I have tried them with just one file type at a time. None of the connection strings worked with any of the file types.
Permissions
The User does have access to the file and its directory.
Your connection string might be the issue here. As far as I know, there isn't 1 that can read all xls, csv, and xlsx. I think you're using the XLSX connection string.
When I read xls, i use the following connection string:
#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + sFilePath + ";Extended Properties='Excel 8.0;HDR=YES;IMEX=1;'"
Having said that, I recommend using a 3rd party file reader/parser to read XLS and CSV since, from my experience, OleDbDataAdapter is wonky depending on the types of data that's being read (and how mixed they are within each column).
For XLS, try NPOI https://code.google.com/p/npoi/
For CSV, try http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
For XLSX, try EPPlus http://epplus.codeplex.com/
I've had great success with the above libraries.
Is it really important that you use an OleDb interface for this? I've always done it with Microsoft.Office.Excel.Interop, to wit:
using System;
using Microsoft.Office.Interop.Excel;
namespace StackOverflowExample
{
class Program
{
static void Main(string[] args)
{
var app = new Application();
var wkbk = app.Workbooks.Open(#"c:\data\foo.xls") as Workbook;
var wksht = wkbk.Sheets[1] as Worksheet; // not zero-based!
for (int row = 1; row <= 100; row++) // not zero-based!
{
Console.WriteLine("This is row #" + row.ToString());
for (int col = 1; col <= 100; col++)
{
Console.WriteLine("This is col #" + col.ToString());
var cell = wksht.Cells[row][col] as Range;
if (cell != null)
{
object val = cell.Value;
if (val != null)
{
Console.WriteLine("The value of the cell is " + val.ToString());
}
}
}
}
}
}
}
As you will be dealing with xlsx extension, you should rather opt for the new connection string.
public static string getConnectionString(string fileName, bool HDRValue, bool WriteExcel)
{
string hdrValue = HDRValue ? "YES" : "NO";
string writeExcel = WriteExcel ? string.Empty : "IMEX=1";
return "Provider=Microsoft.ACE.OLEDB.12.0;" + "Data Source=" + fileName + ";" + "Extended Properties=\"Excel 12.0 xml;HDR=" + hdrValue + ";" + writeExcel + "\"";
}
Above is the code for getting the connection string. First argument expects the actual path for file location. Second argument will decide whether to consider first row values as column headers or not. Third argument helps decide whether you want to open the connection to create and write the data or simply read the data. To read the data set it to "FALSE"
public static ReadData(string filePath, string sheetName, List<string> fieldsToRead, int startPoint, int endPoint)
{
DataTable dt = new DataTable();
try
{
string ConnectionString = ProcessFile.getConnectionString(filePath, false, false);
using (OleDbConnection cn = new OleDbConnection(ConnectionString))
{
cn.Open();
DataTable dbSchema = cn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dbSchema == null || dbSchema.Rows.Count < 1)
{
throw new Exception("Error: Could not determine the name of the first worksheet.");
}
StringBuilder sb = new StringBuilder();
sb.Append("SELECT *");
sb.Append(" FROM [" + sheetName + fieldsToRead[0].ToUpper() + startPoint + ":" + fieldsToRead[1].ToUpper() + endPoint + "] ");
OleDbDataAdapter da = new OleDbDataAdapter(sb.ToString(), cn);
dt = new DataTable(sheetName);
da.Fill(dt);
if (dt.Rows.Count > 0)
{
foreach (DataRow row in dt.Rows)
{
string i = row[0].ToString();
}
}
cn.Dispose();
return fileDatas;
}
}
catch (Exception)
{
}
}
This is for reading 2007 Excel into dataset
DataSet ds = new DataSet();
try
{
string myConnStr = "";
myConnStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=MyDataSource;Extended Properties=\"Excel 12.0;HDR=YES\"";
OleDbConnection myConn = new OleDbConnection(myConnStr);
OleDbCommand cmd = new OleDbCommand("select * from [Sheet1$] ", myConn);
OleDbDataAdapter adapter = new OleDbDataAdapter();
adapter.SelectCommand = cmd;
myConn.Open();
adapter.Fill(ds);
myConn.Close();
}
catch
{ }
return ds;
In my website I am reading a CSV file and parsing it. Now the CSV does not have a column names. It is simply a raw list of comma seperated values.
I take this file and use the ODBCDataReader class to read the rows.
The problem is that when I retrieve the first value it skips the first row of the CSV. This is probably because it considers first row as column header. But in my case there are no column headers. So every time my first row is skipped.
How can I retrieve the first row of my CSV?
Here is the screenshot of my CSV:
Here is the code that I am using to parse the CSV.
public string CsvParser()
{
int _nNrRowsProccessed = 0;
string connectionString = #"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + ConfigurationManager.AppSettings["CSVFolder"] + ";";
OdbcConnection conn = new OdbcConnection(connectionString);
try
{
conn.Open();
string strFileName = ConfigurationManager.AppSettings["CSVFile"];
string strSQL = "Select * from " + strFileName;
OdbcCommand cmd = new OdbcCommand();
cmd.Connection = conn;
cmd.CommandText = strSQL;
cmd.CommandType = CommandType.Text;
OdbcDataReader reader = cmd.ExecuteReader();
string strLine = null;
// MasterCalendar_DB.OpenMySQLConnection();
while (reader.Read())
{
// insert data into mastercalendar
strLine = reader[0].ToString();
string strLine1 = reader[1].ToString();
string strLine2 = reader[2].ToString();
string strLine3 = reader[3].ToString();
string[] arLine = strLine.Split(';');
// string strAgencyPropertyID = arLine[0];
// DateTime dt = DateTime.Parse(arLine[1]);
// Int64 nDate = (Int64)Util.ConvertToUnixTimestamp(dt);
// String strAvailability = (arLine[2]);
_nNrRowsProccessed++;
// MasterCalendar_DB.Insert(strAgencyPropertyID, nDate, strAvailability);
}
}
catch (Exception ex)
{
throw ex;
}
finally
{
conn.Close();
// MasterCalendar_DB.CloseMySQLConnection();
}
return "Success";
}
You want to have a look at the page of the Text-Driver over at connectionstrings.org.
Basically, you create a schema.ini in the same directory, which holds varies options. One of them is the ColNameHeader option, which takes a boolean.
Example from the site:
[customers.txt]
Format=TabDelimited
ColNameHeader=True
MaxScanRows=0
CharacterSet=ANSI
Check this: Schema.ini File (Text File Driver)
You may need to set ColNameHeader = false
Reference Microsoft.VisualBasic and you can use TextFieldParser, which almost certainly has less dependencies than your proposed method.
using (var parser =
new TextFieldParser(#"c:\data.csv")
{
TextFieldType = FieldType.Delimited,
Delimiters = new[] { "," }
})
{
while (!parser.EndOfData)
{
string[] fields;
fields = parser.ReadFields();
//go go go!
}
}
Quick and dirty solution:
string strLine = reader.GetName(0);
string strLine1 = reader.GetName(1);
string strLine2 = reader.GetName(2);
string strLine3 = reader.GetName(3);
reader.GetName(int i); // Gets the name of specified column.
I am currently trying to read in cells from an excel spread sheet, and it seems to reformat cells when I don't want it to. I want it to come through as plan text. I have read a couple of solutions to this problem and I have implemented them, but I am still having the same issue.
The reader turns dates in numbers and numbers into dates.
Example:
Friday, January 29, 2016 comes out to be : 42398
and
40.00 comes out to be : 2/9/1900 12:00:00 AM
code:
string stringconn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + files[0] + ";Extended Properties=\"Excel 12.0;IMEX=1;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text\"";
try {
OleDbConnection conn = new OleDbConnection(stringconn);
OleDbDataAdapter da = new OleDbDataAdapter("SELECT * FROM [CUAnswers$]", conn);
DataTable dt = new DataTable();
try {
printdt(dt);
I have tried
IMEX=0;
HDR=NO;
TypeGuessRows=1;
This is how I am printing out the sheet
public void printdt(DataTable dt) {
int counter1 = 0;
int counter2 = 0;
string temp = "";
foreach (DataRow dataRow in dt.Rows) {
foreach (var item in dataRow.ItemArray) {
temp += " ["+counter1+"]["+counter2+"]"+ item +", ";
counter2++;
}
counter1++;
logger.Debug(temp);
temp = "";
counter2 = 0;
}
}
I had a similar problem, except it was using Interop to read the Excel spreadsheet. This worked for me:
var value = (range.Cells[rowCnt, columnCnt] as Range).Value2;
string str = value as string;
DateTime dt;
if (DateTime.TryParse((value ?? "").ToString(), out dt))
{
// Use the cell value as a datetime
}
Editted to add new ideas
I was going to suggest saving the spreadsheet as comma-separated values. Then Excel converts the cells to text. It is easy to parse a CSV in C#.
That led me to think of how to programmatically do the conversion, which is covered in Convert xls to csv programmatically. Maybe the code in the accepted answer is what you are looking for:
string ExcelFilename = "c:\\ExcelFile.xls";
DataTable worksheets;
string connectionString = #"Provider=Microsoft.Jet.OLEDB.4.0;" + #"Data Source=" + ExcelFilename + ";" + #"Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1""";
using (OleDbConnection connection = new OleDbConnection(connectionString))
{
connection.Open();
worksheets = connection.GetSchema("Tables");
foreach (DataRow row in worksheets.Rows)
{
// For Sheets: 0=Table_Catalog,1=Table_Schema,2=Table_Name,3=Table_Type
// For Columns: 0=Table_Name, 1=Column_Name, 2=Ordinal_Position
string SheetName = (string)row[2];
OleDbCommand command = new OleDbCommand(#"SELECT * FROM [" + SheetName + "]", connection);
OleDbDataAdapter oleAdapter = new OleDbDataAdapter();
oleAdapter.SelectCommand = command;
DataTable dt = new DataTable();
oleAdapter.FillSchema(dt, SchemaType.Source);
oleAdapter.Fill(dt);
for (int r = 0; r < dt.Rows.Count; r++)
{
string type1 = dr[1].GetType().ToString();
string type2 = dr[2].GetType().ToString();
string type3 = dr[3].GetType().ToString();
string type4 = dr[4].GetType().ToString();
string type5 = dr[5].GetType().ToString();
string type6 = dr[6].GetType().ToString();
string type7 = dr[7].GetType().ToString();
}
}
}