I'm trying to parse values from a text file using c#, then append it as a column to the existing data set coming from the same text file. Example file data:
As of 1/31/2015
1 data data data
2 data data data
So I want to use a script component within an ssis data flow to append the 1/31/2015 value as a 4th column. This package iterates through several files, so I would like this to take place within each dataflow. I have no issues with getting the rest of the data into the database into individual columns, but I parse most of those out using tsql after fast loading everything as one big column into the database.
edit:
here is the .NET code I used to get it started, I know this is probably far from optimal, but I actually want to parse two values from the resulting string, and that would be easy to do with regular expressions
string every = "";
string[] lines = File.ReadLines(filename).Take(7).ToArray();
string.Join(",", lines);
every = lines.ToString();
MessageBox.Show(every).ToString();
Related
I successfully made a c# class that uses Jet to execute a SELECT string on a csv file. However, I really need an UPDATE and/or INSERT statement and Jet apparently won't allow it. I've been looking into using LINQ, but can't seem to find any examples of an UPDATE clause for LINQ either. Anyone know about this? or perhaps a different class than LINQ that could accomplish this?
Basically, I want to read a csv file into memory and query on it (select columns, or distinct, etc), which is fine with Jet, but I also want to update rows and modify the text file.
basically:
UPDATE table
SET col3 = 42
WHERE col1='mouse', col2='dolphins';
and have that take effect data read from csv.
also, I can't figure out how to access columns by name with LINQ. any advice?
so far, a constructor for my class seems to parse the file ok (I can see it in the watch and immediate windows), but I don't know how to move on from here:
string Method = this.ToString();
this.strFilePath = filePath;
this.strDelim = delim;
Logger.Debug("Starting", Method);
try
{
fileLines = File.ReadAllLines(filePath);
}
catch (Exception Ex)
{
Logger.Error(Ex, Method);
}
this.fileTable = fileLines.Select(l => l.Split(delim));
Ignore the 'Logger', this is an in-house class that writes things to a log for our own purposes.
What you're asking can't easily be done, due to the way text files are organized.
In general, you can't arbitrarily update a text file like you can a file of records. Consider, for example, the following text file.
line1,43,27,Jim
line2,29,32,Keith
Now, you want to change the 43 on line 1 to 4300. That involves adding two characters to the file. So you end up having to move everything after 43 down two spaces and then insert the 00. But moving everything down two spaces requires extending the file and moving the entire text.
Text files are typically used for sequential access: reading and appending. Insertion or deletion affects the entire file beyond the point of modification. Unless you're going to re-write the entire file on every change, you simply do not want to use a text file for holding data that changes frequently.
I have large csv files (from 2GB up to 25GB) which are different structe. I've made a c# app where user enters position of the columns he wants to replace. Now I have to read the csv, compare the columns from csv to columns that are stored in oracle db table, replace the columns from csv if the select conditon is met, and store the result in new csv.
Now the questions are: what is the best and fastest way to do this? What is the best way to access the db only once to do the select statement for column comparissment and not for every csv line?
Now I've used StreamReader for reading and then spliting csv lines, but I don't know what would be the best way to compare it to the values in db?
Thanks for advice.
StreamReader is an easy way which propably also is not that fast.
If you're able to detect the line number without iterating over all lines you may try with MemoryMappedFiles (might also work with streams if they support seeking)
Then jump into the middle of the file and check if you're above or below the desired line.
Then jump to the middle of the remaining half (upper if hit value is below / lower if hit value is above) and repeat.
This converges with a few iterations even on quite large files.
I have got text file that have next structure:
id=123
name=value
year=2013
Where first part (id, name, year) is name of column, and the last part - the data that I need to past in column. I have not any idea how to do it.
1. I am reading files line by line
2. ??
I have only stupid idea to replace '=' with query and to try run it. But it's look like bad idea...
Also I need to check if the data present in DB.
Upload your text file
Read it line by line
Update DB
How I would do it
Upload text file
Create a List of objects that has those values and has implemented
validation.
Read every line and every three lines try to construct a valid
object of that type in case of errors since I'd had to use regex or something similar to distinguish key from value.
Update database via LINQ2SQL with using my list of objects
I am a complete newbie to SSIS.
I have a c#/sql server background.
I would like to know whether it is possible to validate data before it goes into a database. I am grabbing text from a |(pipe) delimited text file.
For example, if a certain datapoint is null, then change it to 0 or if a certain datapoint's length is 0, then change to "nada".
I don't know if this is even possible with SSIS, but it would be most helpful if you can point me into the right direction.
anything is possible with SSIS!
after your flat file data source, use a Derived Column Transformation. Deriving a new column with the expression being something like the following.
ISNULL(ColumnName) ? "nada" : ColumnName
Then use this new column in your data source destination.
Hope it helps.
I don't know if you're dead set on using SSIS, but the basic method I've generally used to import textfile data into a database generally takes two stages:
Use BULK INSERT to load the file into a temporary staging table on the database server; each of the columns in this staging table are something reasonably tolerant of the data they contain, like a varchar(max).
Write up validation routines to update the data in the temporary table and double-check to make sure that it's well-formed according to your needs, then convert the columns into their final formats and push the rows into the destination table.
I like this method mostly because BULK INSERT can be a bit cryptic about the errors it spits out; with a temporary staging table, it's a lot easier to look through your dataset and fix errors on the fly as opposed to rooting through a text file.
I have list of maybe 50,000 entries that are populated in datagrid in wpf. Now I want to save the data in the list to a file that may be text, or preferably CSV. As list is too big. There is a problem that my implemented method that may be simple text file writing or the method to copy the contents from the datagrid to clipboard and then back to string, and then that string to file using StreamReader. It consumes approx 4-5 minutes even it is in background worker.
Is there any way that I can save huge list to file quickly?
I am using DataGrid in WPF
CODE
dataGrid1.SelectAllCells();
dataGrid1.ClipboardCopyMode = DataGridClipboardCopyMode.IncludeHeader;
ApplicationCommands.Copy.Execute(null, dataGrid1);
String result = (string)Clipboard.GetData(DataFormats.CommaSeparatedValue);
///Never reach to step Below thread stays on above line
dataGrid1.UnselectAllCells();
Clipboard.Clear();
StreamWriter file = new System.IO.StreamWriter(SavePageRankToPDF.FileName);
file.WriteLine(result);
file.Close();
Instead of using the clipboard, why not iterate through the datatable and build the csv file.
Update
Here are some examples:
Convert DataTable to CSV stream
Converting DataSet\DataTable to CSV
One thing that will help is to not load ALL of your data into the datagrid when using it for display purposes. It'd be a good idea to use paging: only load the data into the datagrid that will be needed for calculations or display purposes. If the user wants to see/use more data, go back to your data source and get more of the data. Not only will your app run faster, you'll use much less memory.