creating a difference file from .csv files - c#

I am creating an application which converts a MS Access table and an Excel sheet to .csv files and then differences the access table with the excel sheet. The .csv files are fine but the resulting difference file has errors in fields that contain html (the access table has fields with the html). I'm not sure if this is a special character issue because the special characters were not an issue in creating the .csv file in the first place, or if it is an issue with the way I am differencing the two files.
Part of the problem I suppose could be that in the access .csv file, the fields that contain the html are formatted so that some of the information is on separate lines instead of all on one line, which could be throwing off the reader, but I don't know how to correct this issue.
This is the code for creating the difference file:
string destination = Form2.destination;
string path = Path.Combine(destination, "en-US-diff.csv");
string difFile = path;
if (File.Exists(difFile))
{
File.Delete(difFile);
}
using (var wtr = new StreamWriter(difFile))
{
// Create the IEnumerable data sources
string[] access = System.IO.File.ReadAllLines(csvOutputFile);
string[] excel = System.IO.File.ReadAllLines(csvOutputFile2);
// Create the query
IEnumerable<string> differenceQuery = access.Except(excel);
// Execute the query
foreach (string s in differenceQuery)
{
wtr.WriteLine(s);
}
}

Physical line versus logical line. One solution is to use a sentinel, which is simply an arbitrary string token selected in such a way so as not to confound the parsing process, for example "##||##".
When the input files are created, add the sentinel to the end of each line...
1,1,1,1,1,1,###||##
Going back to your code, the System.IO.File.ReadAllLines(csvOutputFile); uses the Environment.Newline string as its sentinel. This means that you need to replace this statement with the following (pseudo code)...
const string sentinel = "##||##";
string myString = File.ReadAllText("myFileName.csv");
string[] access = myString.Split(new string[]{sentinel},
StringSplitOptions.RemoveEmptyEntries);
At that point you will have the CSV lines in your 'access' array the way you wanted as a collection of 'logical' lines.
To make things further conformant, you would also need to execute this statement on each line of your array...
line = line.Replace(Environment.NewLine, String.Empty).Trim();
That will remove the culprits and allow you to parse the CSV using the methods you have already developed. Of course this statement could be combined with the IO statements in a LINQ expression if desired.

Related

How to import string from a .txt file C#

I am working on a password generator, that uses elements of an array to generate a word-based password.
I currently am working with four arrays, with a lot of elements, and I have to hardcode them individually. I want to automate that process, because writing in a .txt file is both easier and cleaner than writing it on the code itself, and as I plan on distributing this program to my friends, I want to be able to make libraries for the arrays.
Simply put, the .txt file will have four lines, each for one of the arrays.
All I need to know currently is how to import each the lines of the text as a single string, which will be individually formatted into the arrays.
So, for example, the .txt file would have this:
a,b,c,d,e,f,g
d,e,f,g,h,i,j
g,h,i,j,k,l,m
j,k,l,m,n,o,p
And after the "fetching", four different strings would contain each of the lines:
string a = "a,b,c,d,e,f,g"
string b = "d,e,f,g,h,i,j"
string c = "g,h,i,j,k,l,m"
string d = "j,k,l,m,n,o,p"
I will then process it by this, for each string, to break them down into elements.
String pattern = #"\-";
String[] elements = System.Text.RegularExpressions.Regex.Split(passKey, pattern);
You can use this:
System.Collections.Generic.IEnumerable<String> lines = File.ReadLines("c:\\file.txt");
To put them in an array specifically, use:
string[] lines = File.ReadLines("c:\\file.txt").ToArray();

C# trouble creating a table in console

I'm struggle to get data to display in a table within a console application. I believe it may be something to-do with the way I am getting the data to display.
I'm using this to read the content of text files:
string currentDir = Directory.GetCurrentDirectory();
string[] textFiles = Directory.GetFiles(currentDir, "*.txt");
string[] lines = new string[11];
for (int i = 0; i < textFiles.Length; i++)
{
lines[i] = File.ReadAllText(textFiles[i]);
}
Then I'm trying to display all the content of the text file into a table, each text file has 600 entries and they all go together to make a table.
Console.WriteLine("{0,10} \t {1,15}", lines[0], lines[1]);
Was my attempt of getting them to display in a table but only the last entry of lines[0] and first entry of lines[1] are being put on the same line in console... Anyone have any ideas?
You're reading the entire contents of a file in a simple string, so that's what you'll get in that string - the contents of the file, including any new lines, tabs, spaces and so on. If you want to manipulate individual bits of strings within those files, you'll need to split those strings according to some rules first.
The formatting alignment you're using doesn't do all that much by itself - see Alignment Component in https://msdn.microsoft.com/en-us/library/txafckwd.aspx

Reading a specific time file and writing the contents to another one

At runtime, I want to read all files who has the time stamp of a particular time. For example: if the application is running at 11:00:--, then it should read all files which is created after 11:00:00 till now(excluding the present one) and must write in the present file..I have tried like:
string temp_file_format = "ScriptLog_" + DateTime.Now.ToString("dd_MM_yyyy_HH");
string path = #"C:\\ScriptLogs";
var all_files = Directory.GetFiles(path, temp_file_format).SelectMany(File.ReadAllLines);
using (var w = new StreamWriter(logpath))
foreach (var line in all_files)
w.WriteLine(line);
But, this doesn't seems to be working.No error..No exception..But it doesn't read the files, while it exist.
The pattern parameter of the GetFiles method should probably also include a wildcard, something like:
string temp_file_format = "ScriptLog_" + DateTime.Now.ToString("dd_MM_yyyy_HH") + "*";
This will match all files starting with "ScriptLog_13_09_2013_11"
As #Edwin already solved your problem, I'd just like to add a suggestion regarding your code (mostly performance related).
Since you are only reading these lines in order to write them to a different file and discard them from memory, you should consider using File.ReadLines instead of File.ReadAllLines, because the latter method loads all lines from each file into memory unnecessarily.
Combine this with the File.WriteAllLines method, and you can simplify your code while reducing memory pressure to:
var all_files = Directory.GetFiles(path, temp_file_format);
// File.ReadLines returns a "lazy" IEnumerable<string> which will
// yield lines one by one
var all_lines = all_files.SelectMany(File.ReadLines);
// this iterates through all_lines and writes them to logpath
File.WriteAllLines(logpath, all_lines);
All that can even be written as a one-liner (that is, if you are not paid by your source code line count). ;-)

.txt file to .xlsx file (600 rows and 40 columns)

I have a program that combines three text files and put them all into one and sorts them all out alphabetically. I was wondering how I could possibly put this onto an excel spreadsheet without downloading and using the excellibrary (if that's possible).
Heres my code that combines all three files if that helps.
private void button1_Click(object sender, EventArgs e) // merge files button
{
System.IO.StreamWriter output = new System.IO.StreamWriter("OUTPUT.txt");
String[] parts = new String[1000];
String[] parts2 = new String[1000];
parts = File.ReadAllLines(textBox1.Text); //gets filepath from top textbox
parts2 = File.ReadAllLines(textBox2.Text); //gets filepath from middle textbox
String[] head = File.ReadAllLines(headingFileBox.Text); //header file array
//merging the two files onto one list, there is no need to merge the header file because no math is being
//computed on it
var list = new List<String>();
list.AddRange(parts);
list.AddRange(parts2);
//foreach loop to write the header file into the output file
foreach (string h in head)
{
output.WriteLine(h);
}
//prints 3 blank lines for spaces
output.WriteLine();
output.WriteLine();
output.WriteLine();
String[] partsComb = list.ToArray(); // string array that takes in the list
Array.Sort(partsComb);
//foreach loop to combine files and sort them by 1st letter
foreach (string s in partsComb)
{
partsComb.Equals(s);
output.WriteLine(s);
}
output.Close();
}
Any help would be much appreciated.
You could look at creating it in a CSV format (Comma-separated values). Excel naturally opens it up and loads the data into the rows and cells.
Basic CSV looks like this:
"Bob","Smith","12/1/2012"
"Jane","Doe","5/10/2004"
Some things are optional like wrapping everything in quotes, but needed if your data may contain the delimiter.
If you're okay with a comma separated values (CSV) file, that's easy enough to generate with string manipulation and will load in Excel. If you need an excel specific format and are okay with XLSX, you can populate one with some XML manipulation and a ZIP library.
Fair warning, you will have to be careful about escaping commas and new lines if you choose a traditional CSV file. There are libraries that handle that as well.
You might want to try Excel package plus: http://EPPlus.codeplex.com
It's free, lightweight, and can create xlsx files.

Parsing PostgreSQL CSV Log

I am working an a section of application which needs to Parse CSV Logs generated by PostgreSql server.
The Logs are stored C:\Program Files\PostgreSQL\9.0\data\pg_log
The Server version in 9.0.4
The application is developed in C Sharp
The basic utility after Parse the Log is to show contents in a DataGridView.
There are other filter options like to view log contents for a particular range of Time for a Day.
However the main problem that is, the Log format is not readable
It was first tested with A Fast CSV Reader
Parsing CSV files in C#, with header
http://www.codeproject.com/KB/database/CsvReader.aspx
Then we made a custom utility using String.Split method with the usual Foreach loop going through the array
A Sample Log data line
2012-03-21 11:59:20.640 IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle",2012-03-21 11:59:20 IST,2/163,0,LOG,00000,"statement: SELECT id,pdate,itemname,qty from stock_apals order by pdate,id",,,,,,,,"exec_simple_query, .\src\backend\tcop\postgres.c:900",""
As you can see the columns in the Log are comma separated , But however individual values
are not Quote Enclosed.
For instance the 1st,4rth,6th .. columns
Is there a utility or a Regex that can find malformed columns and place quotes
This is especially with respect to performace, becuase these Logs are very long and
new ones are made almost every hour
I just want to update the columns and use the FastCSVReader to parse it.
Thanks for any advice and help
I've updated my csv parser, so it's now able to parse you data (at least provided in example). Below is exampe console app which is parsing your data saved in multiline_quotes.txt file. Project source can be found here (you can download a ZIP). You need either Gorgon.Parsing or Gorgon.Parsing.Net35 (in case you can't use .NET 4.0).
Actually I was able to achive same result using Fast CSV Reader. You just used it some wrong way in the first place.
namespace So9817628
{
using System.Data;
using System.Text;
using Gorgon.Parsing.Csv;
class Program
{
static void Main(string[] args)
{
// prepare
CsvParserSettings s = new CsvParserSettings();
s.CodePage = Encoding.Default;
s.ContainsHeader = false;
s.SplitString = ",";
s.EscapeString = "\"\"";
s.ContainsQuotes = true;
s.ContainsMultilineValues = true;
// uncomment below if you don't want escape quotes ("") to be replaced with single quote
//s.ReplaceEscapeString = false;
CsvParser parser = new CsvParser(s);
DataTable dt = parser.ParseToDataTableSequential("multiline_quotes.txt");
dt.WriteXml("parsed.xml");
}
}
}

Categories