how to format data in a text file

how to format data in a text file - c#

ihave an string builder where it conatins email id( it conatins thousands of email id)
StringBuilder sb = new StringBuilder();
foreach (DataRow dr2 in dtResult.Rows)
{
strtxt = dr2[strMailID].ToString()+";";
sb.Append(strtxt);
}
string filepathEmail = Server.MapPath("Email");
using (StreamWriter outfile = new StreamWriter(filepathEmail + "\\" + "Email.txt"))
{
outfile.Write(sb.ToString());
}
now data is getting stored in text file like this:
abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;
abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;abc#gmail.com;ab#gmail.com;
But i need to store them like where every row should only only 10 email id, so that i looks good**
any idea how to format the data like this in .txt file? any help would be great

Just add a counter in your loop and append a line break every 10 lines.
int counter = 0;
StringBuilder sb = new StringBuilder();
foreach (DataRow dr2 in dtResult.Rows)
{
counter++;
strtxt = dr2[strMailID].ToString()+";";
sb.Append(strtxt);
if (counter % 10 == 0)
{
sb.Append(Environment.NewLine);
}
}

Use a counter and add a line break each tenth item:
StringBuilder sb = new StringBuilder();
int cnt = 0;
foreach (DataRow dr2 in dtResult.Rows) {
sb.Append(dr2[strMailID]).Append(';');
if (++cnt == 10) {
cnt = 0;
sb.AppendLine();
}
}
string filepathEmail = Path.Combine(Server.MapPath("Email"), "Email.txt");
File.WriteAllText(filepathEmail, sb.ToString());
Notes:
Concatentate strings using the StringBuilder instead of first concatenating and then appending.
Use Path.Combine to combine the path and file name, this works on any platform.
You can use the File.WriteAllText method to save the string in a single call instead of writing to a StreamWriter.

as it said you may add a "line break" I suggest to add '\t' tab after each address so your file will be CSV format and you can import it in Excel for instance.

Use a counter to keep track of number of mail already written, like this:
int i = 0;
foreach (string mail in mails) {
var strtxt = mail + ";";
sb.Append(strtxt);
i++;
if (i % 10==0)
sb.AppendLine();
}
Every 10 mails written, i modulo 10 equals 0, so you put an end line in the string builder.
Hope this can help.

Here's an alternate method using LINQ if you don't mind any overheads.
string filepathEmail = Server.MapPath("Email");
using (StreamWriter outfile = new StreamWriter(filepathEmail + "\\" + "Email.txt"))
{
var rows = dtResult.Rows.Cast<DataRow>(); //make the rows enumerable
var lines = from ivp in rows.Select((dr2, i) => new {i, dr2})
group ivp.dr2[strMailID] by ivp.i / 10 into line //group every 10 emails
select String.Join(";", line); //put them into a string
foreach (string line in lines)
outfile.WriteLine(line);
}

Related

C# - Split CSV File by Removing Bad Rows

I have a csv file with 2 million rows and file size of 2 GB. But due to a couple of free text form columns, these contain redundant CRLF and cause the file to not load in the SQL Server table. I get an error that the last column does not end with ".
I have the following code, but it gives an OutOfMemoryException when reading from fileName. The line is:
var lines = File.ReadAllLines(fileName);
How can I fix it? Ideally, I would like to split the file into two good and bad rows. Or delete rows that do not end with "CRLF.
int goodRow = 0;
int badRow = 0;
String badRowFileName = fileName.Substring(0, fileName.Length - 4) + "BadRow.csv";
String goodRowFileName = fileName.Substring(0, fileName.Length - 4) + "GoodRow.csv";
var charGood = "\"\"";
String lineOut = string.Empty;
String str = string.Empty;
var lines = File.ReadAllLines(fileName);
StringBuilder sbGood = new StringBuilder();
StringBuilder sbBad = new StringBuilder();
foreach (string line in lines)
{
if (line.Contains(charGood))
{
goodRow++;
sbGood.AppendLine(line);
}
else
{
badRow++;
sbBad.AppendLine(line);
}
}
if (badRow > 0)
{
File.WriteAllText(badRowFileName, sbBad.ToString());
}
if (goodRow > 0)
{
File.WriteAllText(goodRowFileName, sbGood.ToString());
}
sbGood.Clear();
sbBad.Clear();
msg = msg + "Good Rows - " + goodRow.ToString() + " Bad Rows - " + badRow.ToString() + " Done.";

You can translate that code like this to be much more efficient:
int goodRow = 0, badRow = 0;
String badRowFileName = fileName.Substring(0, fileName.Length - 4) + "BadRow.csv";
String goodRowFileName = fileName.Substring(0, fileName.Length - 4) + "GoodRow.csv";
var charGood = "\"\"";
using (var lines = File.ReadLines(fileName))
using (var swGood = new StreamWriter(goodRowFileName))
using (var swBad = new StreamWriter(badRowFileName))
{
foreach (string line in lines)
{
if (line.Contains(charGood))
{
goodRow++;
swGood.WriteLine(line);
}
else
{
badRow++;
swBad.WriteLine(line);
}
}
}
msg += $"Good Rows: {goodRow,9} Bad Rows: {badRow,9} Done.";
But I'd also look at using a real csv parser for this. There are plenty on NuGet. That might even let you clean up the data on the fly.

I would not suggest reading the entire file into memory, then processing the file, then writing all modified contents out to the new file.
Instead using file streams:
using (var rdr = new StreamReader(fileName))
using (var wrtrGood = new StreamWriter(goodRowFileName))
using (var wrtrBad = new StreamWriter(badRowFileName))
{
string line = null;
while ((line = rdr.ReadLine()) != null)
{
if (line.Contains(charGood))
{
goodRow++;
wrtr.WriteLine(line);
}
else
{
badRow++;
wrtrBad.WriteLine(line);
}
}
}

Data separated by column in C# from a text file

File txt
I have this file in text and need to organize ordered in table.
OBS: need to be console app c #
I did it only:
StreamReader sr = new StreamReader(#"filepatch.txt");
string ler = sr.ReadLine();
string linha = ";";
int cont = 0;
while((linha = sr.ReadLine())!= null)
{
string col = linha.Split(';')[2];
cont++;
Console.WriteLine("{0} : {1}", cont, linha);
}

Try this to get the file text:
var lines = System.IO.File.ReadAllLines(#"filepatch.txt");
Then you can use the returned string[] to carry out the rest of your logic.
foreach(var line in lines)
{
string[] cols = line.Split(';');
// Your logic here.
}
Cheers!

Copying CSV file while reordering/adding empty columns

Copying CSV file while reordering/adding empty columns.
For example if ever line of incoming file has values for 3 out of 10 columns in order different from output like (except first which is header with column names):
col2,col6,col4 // first line - column names
2, 5, 8 // subsequent lines - values for 3 columns
and output expected to have
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
then output should be "" for col0,col1,col3,col5,col7,col8,col9,and values from col2,col4,col4 in the input file. So for the shown second line (2,5,8) expected output is ",,2,,5,,8,,,,,"
Below code I've tried and it is slower than I want.
I have two lists.
The first list filecolumnnames is created by splitting a delimited string (line) and this list gets recreated for every line in the file.
The second list list has the order in which the first list needs to be rearranged and re concatenated.
This works
string fileName = "F:\\temp.csv";
//file data has first row col3,col2,col1,col0;
//second row: 4,3,2,1
//so on
string fileName_recreated = "F:\\temp_1.csv";
int count = 0;
const Int32 BufferSize = 1028;
using (var fileStream = File.OpenRead(fileName))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize))
{
String line;
List<int> list = new List<int>();
string orderedcolumns = "\"\"";
string tableheader = "col0,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10";
List<string> tablecolumnnames = new List<string>();
List<string> filecolumnnames = new List<string>();
while ((line = streamReader.ReadLine()) != null)
{
count = count + 1;
StringBuilder sb = new StringBuilder("");
tablecolumnnames = tableheader.Split(',').ToList();
if (count == 1)
{
string fileheader = line;
//fileheader=""col2,col1,col0"
filecolumnnames = fileheader.Split(',').ToList();
foreach (string col in tablecolumnnames)
{
int index = filecolumnnames.IndexOf(col);
if (index == -1)
{
sb.Append(",");
// orderedcolumns=orderedcolumns+"+\",\"";
list.Add(-1);
}
else
{
sb.Append(filecolumnnames[index] + ",");
//orderedcolumns = orderedcolumns+ "+filecolumnnames["+index+"]" + "+\",\"";
list.Add(index);
}
// MessageBox.Show(orderedcolumns);
}
}
else
{
filecolumnnames = line.Split(',').ToList();
foreach (int items in list)
{
//MessageBox.Show(items.ToString());
if (items == -1)
{
sb.Append(",");
}
else
{
sb.Append(filecolumnnames[items] + ",");
}
}
//expected format sb.Append(filecolumnnames[3] + "," + filecolumnnames[2] + "," + filecolumnnames[2] + ",");
//sb.Append(orderedcolumns);
var result = String.Join (", ", list.Select(index => filecolumnnames[index]));
}
using (FileStream fs = new FileStream(fileName_recreated, FileMode.Append, FileAccess.Write))
using (StreamWriter sw = new StreamWriter(fs))
{
sw.WriteLine(sb.ToString());
}
}
I am trying to make it faster by constructing a string orderedcolumns and remove the second for each loop which happens for every row and replace it with constructed string.
so if you uncomment the orderedcolumns string construction orderedcolumns = orderedcolumns+ "+filecolumnnames["+index+"]" + "+\",\""; and uncomment the append sb.Append(orderedcolumns); I am expecting the value inside the constructed string but when I append the orderedcolumns it is appending the text i.e.
""+","+filecolumnnames[3]+","+filecolumnnames[2]+","+filecolumnnames[1]+","+filecolumnnames[0]+","+","+","+","+","+","+","
i.e. I instead want it to take the value inside the filecolumnnames[3] list and not the filecolumnnames[3] name itself.
Expected value: if that line has 1,2,3,4
I want the output to be 4,3,2,1 as filecolumnnames[3] will have 4, filecolumnnames[2] will have 3..

String.Join is the way to construct comma/space delimited strings from sequence.
var result = String.Join (", ", list.Select(index => filecolumnnames[index]);
Since you are reading only subset of columns and orders in input and output don't match I'd use dictionary to hold each row of input.
var row = tablecolumnnames
.Zip(line.Split(','), (Name,Value)=> new {Name,Value})
.ToDictionary(x => x.Name, x.Value);
For output I'd fill sequence from defaults or input row:
var outputLine = String.Join(",",
filecolumnnames
.Select(name => row.ContainsKey(name) ? row[name] : ""));
Note code is typed in and not compiled.

orderedcolumns = orderedcolumns+ "+filecolumnnames["+index+"]" + "+\",\""; "
should be
orderedcolumns = orderedcolumns+ filecolumnnames[index] + ",";
you should however use join as others have pointed out. Or
orderedcolumns.AppendFormat("{0},", filecolumnnames[index]);
you will have to deal with the extra ',' on the end

how to add double quotes to a string which contains comma?

I successfully imported following file in database but my import method removes double quotes during saving process. but i want to export this file as it is , i.e add quotes to a string which contains delimiter so how to achieve this .
here is my csv file with headers and 1 record.
PTNAME,REGNO/ID,BLOOD GRP,WARD NAME,DOC NAME,XRAY,PATHO,MEDICATION,BLOOD GIVEN
Mr. GHULAVE VASANTRAO PANDURANG,SH1503/00847,,RECOVERY,SHELKE SAMEER,"X RAY PBH RT IT FEMUR FRACTURE POST OP XRAY -ACCEPTABLE WITH IMPLANT IN SITU 2D ECHO MILD CONC LVH GOOD LV SYSTOLIC FUN, ALTERED LV DIASTOLIC FUN.", HB-11.9gm% TLC-8700 PLT COUNT-195000 BSL-173 UREA -23 CREATININE -1.2 SR.ELECTROLYTES-WNR BLD GROUP-B + HIV-NEGATIVE HBsAG-NEGATIVE PT INR -15/15/1.0. ECG SINUS TACHYCARDIA ,IV TAXIMAX 1.5 GM 1-0-1 IV TRAMADOL DRIP 1-0-1 TAB NUSAID SP 1-0-1 TAB ARCOPAN D 1-0-1 CAP BONE C PLUS 1 -0-1 TAB ANXIT 0.5 MG 0-0-1 ANKLE TRACTION 3 KG RT LL ,NOT GIVEN
Here is my method of export:
public void DataExport(string SelectQuery, string fileName)
{
try
{
DataTable dt = new DataTable();
SqlDataAdapter da = new SqlDataAdapter(SelectQuery, con);
da.Fill(dt);
//Sets file path and print Headers
// string filepath = txtreceive.Text + "\\" + fileName;
string filepath = #"C:\Users\Priya\Desktop\R\z.csv";
StreamWriter sw = new StreamWriter(filepath);
int iColCount = dt.Columns.Count;
// First we will write the headers if IsFirstRowColumnNames is true: //
for (int i = 0; i < iColCount; i++)
{
sw.Write(dt.Columns[i]);
if (i < iColCount - 1)
{
sw.Write(',');
}
}
sw.Write(sw.NewLine);
foreach (DataRow dr in dt.Rows) // Now write all the rows.
{
for (int i = 0; i < iColCount; i++)
{
if (!Convert.IsDBNull(dr[i]))
{
sw.Write(dr[i].ToString());
}
if (i < iColCount - 1)
{
sw.Write(',');
}
}
sw.Write(sw.NewLine);
}
sw.Close();
}
catch { }
}

if (myString.Contains(","))
{
myWriter.Write("\"{0}\"", myString);
}
else
{
myWriter.Write(myString);
}

The very simplest thing that you can do is replace the line sw.Write(dr[i].ToString()); with this:
var text = dr[i].ToString();
text = text.Contains(",") ? String.Format("\"{0}\"", text) : text;
sw.Write(text);
However, there are quite a few other issues with your code - most importantly you are opening a lot of disposable resources without disposing them properly.
I'd suggest a bit of a rewrite, like this:
public void DataExport(string SelectQuery, string fileName)
{
using (var dt = new DataTable())
{
using (var da = new SqlDataAdapter(SelectQuery, con))
{
da.Fill(dt);
var header = String.Join(
",",
dt.Columns.Cast<DataColumn>().Select(dc => dc.ColumnName));
var rows =
from dr in dt.Rows.Cast<DataRow>()
select String.Join(
",",
from dc in dt.Columns.Cast<DataColumn>()
let t1 = Convert.IsDBNull(dr[dc]) ? "" : dr[dc].ToString()
let t2 = t1.Contains(",") ? String.Format("\"{0}\"", t1) : t1
select t2);
using (var sw = new StreamWriter(fileName))
{
sw.WriteLine(header);
foreach (var row in rows)
{
sw.WriteLine(row);
}
sw.Close();
}
}
}
}
I've also broken apart the querying of the data from the data adapter from the writing of the data to the stream writer.
And, of course, I'm adding double-quotes to text that contains commas.
The only other thing I was concerned about was the fact that con is clearly a class-level variable and it is being left open. That's bad. Connections should be opened and closed each time they are used. You should probably consider making that change too.
You could also remove the stream write entirely by replacing that block with this:
File.WriteAllLines(fileName, new [] { header }.Concat(rows));
And, finally, wrapping your code in a try { ... } catch { } is just a bad practice. It's like saying "I'm writing some code that could fail, but I don't care and I don't want to be told if it does fail". You should only even catch specific exceptions that you do deal with. In this code you should consider catching file exceptions like running out of hard drive space, or writing to a read-only file, etc.

What is the fast process to find the duplicate row from a Csv file?

I have a csv file containing 15,00,000 records and need to find the duplicate rows from the csv file. I am trying as below code
DataTable dtUniqueDataView = dvDataView.ToTable(true, Utility.GetHeadersFromCsv(csvfilePath).Select(c => c.Trim()).ToArray());
But in this I am not getting the duplicate records and it is taking nearly 4 mins time to do the operation. Can any one suggest the process which could reduce the time and give the duplicate result set.

Not the final solution but maybe something to start with:
Read the CSV file line by line and calculate a hash value of each line. You should be able to keep those numeric values in memory.
String.GetHashCode() is not good enough for this purpose as it may return same results for different strings as pointed out correctly in the comments. A more stable hashing algorithm is required.
Store them away in a HashSet<int> and check if the value already exists in there. If yes, you can skip the row.
Note: If most of the time is spent reading the file (as assumed in the comment above), you will have to work on this issue first. My assumption is you're worried about finding the duplicates.

Read the csv file as a stream. Read it only a line at a time. For each line read, calculate the MD5 hash and compare if the hash already exists in your stash. If it does it's a duplicate.

I wrote an example with Hashset:
Output (15,000,000 entries in a csv file):
Reading File
File distinct read in 1600,6632 ms
Output (30,000,000 entries in a csv file):
Reading File
File distinct read in 3192,1997 ms
Output (45,000,000 entries in a csv file):
Reading File
File distinct read in 4906,0755 ms
class Program
{
static void Main(string[] args)
{
string csvFile = "test.csv";
if (!File.Exists(csvFile)) //Create a test CSV file
CreateCSVFile(csvFile, 45000000, 15000);
List<string> distinct = GetDistinct(csvFile); //Returns every line once
Console.ReadKey();
}
static List<string> GetDistinct(string filename)
{
Stopwatch sw = new Stopwatch();//just a timer
List<HashSet<string>> lines = new List<HashSet<string>>(); //Hashset is very fast in searching duplicates
HashSet<string> current = new HashSet<string>(); //This hashset is used at the moment
lines.Add(current); //Add the current Hashset to a list of hashsets
sw.Restart(); //just a timer
Console.WriteLine("Reading File"); //just an output message
foreach (string line in File.ReadLines(filename))
{
try
{
if (lines.TrueForAll(hSet => !hSet.Contains(line))) //Look for an existing entry in one of the hashsets
current.Add(line); //If line not found, at the line to the current hashset
}
catch (OutOfMemoryException ex) //Hashset throws an Exception by ca 12,000,000 elements
{
current = new HashSet<string>() { line }; //The line could not added before, add the line to the new hashset
lines.Add(current); //add the current hashset to the List of hashsets
}
}
sw.Stop();//just a timer
Console.WriteLine("File distinct read in " + sw.Elapsed.TotalMilliseconds + " ms");//just an output message
List<string> concatinated = new List<string>(); //Create a list of strings out of the hashset list
lines.ForEach(set => concatinated.AddRange(set)); //Fill the list of strings
return concatinated; //Return the list
}
static void CreateCSVFile(string filename, int entries, int duplicateRow)
{
StringBuilder sb = new StringBuilder();
using (FileStream fs = File.OpenWrite(filename))
using (StreamWriter sw = new StreamWriter(fs))
{
Random r = new Random();
string duplicateLine = null;
string line = "";
for (int i = 0; i < entries; i++)
{
line = r.Next(1, 10) + ";" + r.Next(11, 45) + ";" + r.Next(20, 500) + ";" + r.Next(2, 11) + ";" + r.Next(12, 46) + ";" + r.Next(21, 501);
sw.WriteLine(line);
if (i % duplicateRow == 0)
{
if (duplicateLine != null && i < entries - 1)
{
sb.AppendLine(duplicateLine);
i++;
}
duplicateLine = line;
}
}
}
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

how to format data in a text file - c#

Just add a counter in your loop and append a line break every 10 lines. int counter = 0; StringBuilder sb = new StringBuilder(); foreach (DataRow dr2 in dtResult.Rows) { counter++; strtxt = dr2[strMailID].ToString()+";"; sb.Append(strtxt); if (counter % 10 == 0) { sb.Append(Environment.NewLine); } }

as it said you may add a "line break" I suggest to add '\t' tab after each address so your file will be CSV format and you can import it in Excel for instance.

Related

C# - Split CSV File by Removing Bad Rows

Data separated by column in C# from a text file

Copying CSV file while reordering/adding empty columns

how to add double quotes to a string which contains comma?

What is the fast process to find the duplicate row from a Csv file?

Categories

Resources