linq to update or insert on a csv file - c#

I successfully made a c# class that uses Jet to execute a SELECT string on a csv file. However, I really need an UPDATE and/or INSERT statement and Jet apparently won't allow it. I've been looking into using LINQ, but can't seem to find any examples of an UPDATE clause for LINQ either. Anyone know about this? or perhaps a different class than LINQ that could accomplish this?
Basically, I want to read a csv file into memory and query on it (select columns, or distinct, etc), which is fine with Jet, but I also want to update rows and modify the text file.
basically:
UPDATE table
SET col3 = 42
WHERE col1='mouse', col2='dolphins';
and have that take effect data read from csv.
also, I can't figure out how to access columns by name with LINQ. any advice?
so far, a constructor for my class seems to parse the file ok (I can see it in the watch and immediate windows), but I don't know how to move on from here:
string Method = this.ToString();
this.strFilePath = filePath;
this.strDelim = delim;
Logger.Debug("Starting", Method);
try
{
fileLines = File.ReadAllLines(filePath);
}
catch (Exception Ex)
{
Logger.Error(Ex, Method);
}
this.fileTable = fileLines.Select(l => l.Split(delim));
Ignore the 'Logger', this is an in-house class that writes things to a log for our own purposes.

What you're asking can't easily be done, due to the way text files are organized.
In general, you can't arbitrarily update a text file like you can a file of records. Consider, for example, the following text file.
line1,43,27,Jim
line2,29,32,Keith
Now, you want to change the 43 on line 1 to 4300. That involves adding two characters to the file. So you end up having to move everything after 43 down two spaces and then insert the 00. But moving everything down two spaces requires extending the file and moving the entire text.
Text files are typically used for sequential access: reading and appending. Insertion or deletion affects the entire file beyond the point of modification. Unless you're going to re-write the entire file on every change, you simply do not want to use a text file for holding data that changes frequently.

Related

How do I append to a CSV file using Filehelpers with multiple record types that have distinct headers?

As the question says, using the FileHelpers library I am attempting to generate a CSV file along side a report file. The report file may have different (but finite) inputs/data structures and hence my CSV generation method is not explicitly typed. The CSV contains all of the report data as well as the report's header information. For my headers, I am using the class object properties because they are descriptive enough for my end use purpose.
My relevant code snippet is below:
// File location, where the .csv goes and gets stored.
string filePath = Path.Combine(destPath, fileName);
// First, write report header details based on header list
Type type = DetermineListType(headerValues);
var headerEngine = new FileHelperEngine(type);
headerEngine.HeaderText = headerEngine.GetFileHeader();
headerEngine.WriteFile(filePath, (IEnumerable<object>)headerValues);
// Next, append the report data below the report header data.
type = DetermineListType(reportData);
var reportDataEngine = new FileHelperEngine(type);
reportDataEngine.HeaderText = reportDataEngine.GetFileHeader();
reportDataEngine.AppendToFile(filePath, (IEnumerable<object>)reportData);
When this is executed, the CSV is successfully generated however the .AppendToFile() method does not add the reportDataEngine.HeaderText. From the documentation I do not see this functionality to .AppendToFile() and I am wondering if anyone has a known work-around for this or a suggestion how to output the headers of two different class objects in a single CSV file using FileHelpers.
The desired output would look something like this however in a single CSV file (This would be a contiguous CSV obviously; not tables)
Report_Name
Operator
Timestamp
Access Report
User1
14:50:12 28 Dec 2020
UserID
Login_Time
Logout_Time
User4
09:33:23
10:45:34
User2
11:32:11
11:44:11
User4
15:14:22
16:31:09
User1
18:55:32
19:10:10
I have looked also at the MultiRecordEngine in FileHelpers and while I think this may be helpful, I cannot figure out based on the examples how to actually write a multirecord CSV file in the required fashion I have above; if it is possible at all.
Thank you!
The best way is to merge the columns and make one big table then make your classes match the columns you need to separate them out when reading. CSV only allows for the first row to define the column names and that is optional based on your use case. Look at CSVHelper https://joshclose.github.io/CsvHelper/ it has a lot of built-in features with lots of examples. Let me know if you need additional help.

Read large CSV, replace some column values from oracle db and write new csv

I have large csv files (from 2GB up to 25GB) which are different structe. I've made a c# app where user enters position of the columns he wants to replace. Now I have to read the csv, compare the columns from csv to columns that are stored in oracle db table, replace the columns from csv if the select conditon is met, and store the result in new csv.
Now the questions are: what is the best and fastest way to do this? What is the best way to access the db only once to do the select statement for column comparissment and not for every csv line?
Now I've used StreamReader for reading and then spliting csv lines, but I don't know what would be the best way to compare it to the values in db?
Thanks for advice.
StreamReader is an easy way which propably also is not that fast.
If you're able to detect the line number without iterating over all lines you may try with MemoryMappedFiles (might also work with streams if they support seeking)
Then jump into the middle of the file and check if you're above or below the desired line.
Then jump to the middle of the remaining half (upper if hit value is below / lower if hit value is above) and repeat.
This converges with a few iterations even on quite large files.

c# ssis - script component within data flow

I'm trying to parse values from a text file using c#, then append it as a column to the existing data set coming from the same text file. Example file data:
As of 1/31/2015
1 data data data
2 data data data
So I want to use a script component within an ssis data flow to append the 1/31/2015 value as a 4th column. This package iterates through several files, so I would like this to take place within each dataflow. I have no issues with getting the rest of the data into the database into individual columns, but I parse most of those out using tsql after fast loading everything as one big column into the database.
edit:
here is the .NET code I used to get it started, I know this is probably far from optimal, but I actually want to parse two values from the resulting string, and that would be easy to do with regular expressions
string every = "";
string[] lines = File.ReadLines(filename).Take(7).ToArray();
string.Join(",", lines);
every = lines.ToString();
MessageBox.Show(every).ToString();

How to validate column before importing into database

I am a complete newbie to SSIS.
I have a c#/sql server background.
I would like to know whether it is possible to validate data before it goes into a database. I am grabbing text from a |(pipe) delimited text file.
For example, if a certain datapoint is null, then change it to 0 or if a certain datapoint's length is 0, then change to "nada".
I don't know if this is even possible with SSIS, but it would be most helpful if you can point me into the right direction.
anything is possible with SSIS!
after your flat file data source, use a Derived Column Transformation. Deriving a new column with the expression being something like the following.
ISNULL(ColumnName) ? "nada" : ColumnName
Then use this new column in your data source destination.
Hope it helps.
I don't know if you're dead set on using SSIS, but the basic method I've generally used to import textfile data into a database generally takes two stages:
Use BULK INSERT to load the file into a temporary staging table on the database server; each of the columns in this staging table are something reasonably tolerant of the data they contain, like a varchar(max).
Write up validation routines to update the data in the temporary table and double-check to make sure that it's well-formed according to your needs, then convert the columns into their final formats and push the rows into the destination table.
I like this method mostly because BULK INSERT can be a bit cryptic about the errors it spits out; with a temporary staging table, it's a lot easier to look through your dataset and fix errors on the fly as opposed to rooting through a text file.

receiving everyday XML files - 12 types need to do search on these everyday

Asp.NET - C#.NET
I need a advice regarding a design problem below:
I'll receive everyday XML files. It changes the quantity e.g. yesterday 10 XML files received, today XML 56 files received and maybe tomorrow 161 XML files etc.
There are 12 types (12 XSD)... and in the top there is a attribute called FormType e.g. FormType="1", FormType="2" , FormType="12" etc. up to 12 formtypes.
All of them have common fields like Name, adres, Phone.
But e.g. FormType=1 is for Construction, FormType=2 is for IT, FormType 3=Hospital, Formtype=4 is for Advertisement etc. etc.
As I said all of them have common attributes.
Requirements:
Need a search screen so the user can do search on these XML contents. But I don't have any clue how to approach this. e.g. Search the text in some attributes for the xml's received from Date_From and Date_To.
Problem:
I've heard about putting the XML's in a Binary field and do XPATH query or whatever but don't know the word's to search on google.
I was thinking to create a big database.table and read all XML's and put in the Database Table. But the issue is some xml attributes are very huge like 2-3 pages. and the same attributes in other XML file are empty..
So creating NVARCHAR(MAX) for every XML attribute and putting them in table.field.... After some period my DATABASE will be a big big monster...
Can someone advice what is the best approach to handle this issue?
I'm not 100% sure I understand your problem. I'm guessing that the query's supposed to return individual XML documents that meet some kind of user-specified criteria.
In that event, my starting point would probably be to implement a method for querying a single XML document, i.e. one that returns true if the document's a hit and false otherwise. In all likelihood, I'd make the query parameter an XPath query, but who knows? Here's a simple example:
public bool TestXml(XDocument d, string query)
{
return d.XPathSelectElements(query).Any();
}
Next, I need a store of XML documents to query. Where does that store live, and what form does it take? At a certain level, those are implementation details that my application doesn't care about. They could live in a database, or the file system. They could be cached in memory. I'd start by keeping it simple, something like:
public IEnumerable<XDocument> XmlDocuments()
{
DirectoryInfo di = new DirectoryInfo(XmlDirectoryPath);
foreach (FileInfo fi in di.GetFiles())
{
yield return XDocument.Load(fi.Filename);
}
}
Now I can get all of the documents that fulfill a request like this:
public IEnumerable<XDocument> GetDocuments(query)
{
return XmlDocuments.Where(x => TextXml(x, query));
}
The thing that jumps out at me when I look at this problem: I have to parse my documents into XDocument objects to query them. That's going to happen whether they live in a database or the file system. (If I stick them in a database and write a stored procedure that does XPath queries, as someone suggested, I'm still parsing all of the XML every time I execute a query; I've just moved all that work to the database server.)
That's a lot of I/O and CPU time that gets spent doing the exact same thing over and over again. If the volume of queries is anything other than tiny, I'd consider building a List<XDocument> the first time GetDocuments() is called and come up with a scheme of keeping that list in memory until new XML documents are received (or possibly updating it when new XML documents are received).

Categories