Write to CSV file and export it? - c#

In C# ASP.net, could someone show me how I can write entries from an Array/List to a CSV file on the server and then open the file? I think the second part would be something like - Response.Redirect("http://myserver.com/file.csv"), however not sure on how to write the file on the server.
Also if this page is accessed by many users, is it better to generate a new CSV file every time or overwrite the same file? Would there be any read/write/lock issues if both users try accessing the same CSV file etc.?
Update:
This is probably a silly question and I have searched on Google but I'm not able to find a definitive answer - how do you write a CSV file to the webserver and export it in C# ASP.net? I know how to generate it but I would like to save it to www.mysite.com/my.csv and then export it.

Rom, you're doing it wrong. You don't want to write files to disk so that IIS can serve them up. That adds security implications as well as increases complexity. All you really need to do is save the CSV directly to the response stream.
Here's the scenario: User wishes to download csv. User submits a form with details about the csv they want. You prepare the csv, then provide the user a URL to an aspx page which can be used to construct the csv file and write it to the response stream. The user clicks the link. The aspx page is blank; in the page codebehind you simply write the csv to the response stream and end it.
You can add the following to the (I believe this is correct) Load event:
string attachment = "attachment; filename=MyCsvLol.csv";
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.ClearHeaders();
HttpContext.Current.Response.ClearContent();
HttpContext.Current.Response.AddHeader("content-disposition", attachment);
HttpContext.Current.Response.ContentType = "text/csv";
HttpContext.Current.Response.AddHeader("Pragma", "public");
var sb = new StringBuilder();
foreach(var line in DataToExportToCSV)
sb.AppendLine(TransformDataLineIntoCsv(line));
HttpContext.Current.Response.Write(sb.ToString());
writing to the response stream code ganked from here.

Here's a very simple free open-source CsvExport class for C#. There's an ASP.NET MVC example at the bottom.
https://github.com/jitbit/CsvExport
It takes care about line-breaks, commas, escaping quotes, MS Excel compatibilty... Just add one short .cs file to your project and you're good to go.
(disclaimer: I'm one of the contributors)

Here is a CSV action result I wrote that takes a DataTable and converts it into CSV. You can return this from your view and it will prompt the user to download the file. You should be able to convert this easily into a List compatible form or even just put your list into a DataTable.
using System;
using System.Text;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.Mvc;
using System.Data;
namespace Detectent.Analyze.ActionResults
{
public class CSVResult : ActionResult
{
/// <summary>
/// Converts the columns and rows from a data table into an Microsoft Excel compatible CSV file.
/// </summary>
/// <param name="dataTable"></param>
/// <param name="fileName">The full file name including the extension.</param>
public CSVResult(DataTable dataTable, string fileName)
{
Table = dataTable;
FileName = fileName;
}
public string FileName { get; protected set; }
public DataTable Table { get; protected set; }
public override void ExecuteResult(ControllerContext context)
{
StringBuilder csv = new StringBuilder(10 * Table.Rows.Count * Table.Columns.Count);
for (int c = 0; c < Table.Columns.Count; c++)
{
if (c > 0)
csv.Append(",");
DataColumn dc = Table.Columns[c];
string columnTitleCleaned = CleanCSVString(dc.ColumnName);
csv.Append(columnTitleCleaned);
}
csv.Append(Environment.NewLine);
foreach (DataRow dr in Table.Rows)
{
StringBuilder csvRow = new StringBuilder();
for(int c = 0; c < Table.Columns.Count; c++)
{
if(c != 0)
csvRow.Append(",");
object columnValue = dr[c];
if (columnValue == null)
csvRow.Append("");
else
{
string columnStringValue = columnValue.ToString();
string cleanedColumnValue = CleanCSVString(columnStringValue);
if (columnValue.GetType() == typeof(string) && !columnStringValue.Contains(","))
{
cleanedColumnValue = "=" + cleanedColumnValue; // Prevents a number stored in a string from being shown as 8888E+24 in Excel. Example use is the AccountNum field in CI that looks like a number but is really a string.
}
csvRow.Append(cleanedColumnValue);
}
}
csv.AppendLine(csvRow.ToString());
}
HttpResponseBase response = context.HttpContext.Response;
response.ContentType = "text/csv";
response.AppendHeader("Content-Disposition", "attachment;filename=" + this.FileName);
response.Write(csv.ToString());
}
protected string CleanCSVString(string input)
{
string output = "\"" + input.Replace("\"", "\"\"").Replace("\r\n", " ").Replace("\r", " ").Replace("\n", "") + "\"";
return output;
}
}
}

A comment about Will's answer, you might want to replace HttpContext.Current.Response.End(); with HttpContext.Current.ApplicationInstance.CompleteRequest(); The reason is that Response.End() throws a System.Threading.ThreadAbortException. It aborts a thread. If you have an exception logger, it will be littered with ThreadAbortExceptions, which in this case is expected behavior.
Intuitively, sending a CSV file to the browser should not raise an exception.
See here for more Is Response.End() considered harmful?

How to write to a file (easy search in Google) ... 1st Search Result
As far as creation of the file each time a user accesses the page ... each access will act on it's own behalf. You business case will dictate the behavior.
Case 1 - same file but does not change (this type of case can have multiple ways of being defined)
You would have logic that created the file when needed and only access the file if generation is not needed.
Case 2 - each user needs to generate their own file
You would decide how you identify each user, create a file for each user and access the file they are supposed to see ... this can easily merge with Case 1. Then you delete the file after serving the content or not if it requires persistence.
Case 3 - same file but generation required for each access
Use Case 2, this will cause a generation each time but clean up once accessed.

check out csvreader/writer library at http://www.codeproject.com/KB/cs/CsvReaderAndWriter.aspx

Related

Extracting and Sorting data from pdf using C# package

I'm working on a project where I have to extract specific text from a pdf so that I can send these info into an excel file.
I tried at first to convert my pdf into a .txt file thinking a .txt file format would be easier to convert into json.
But the result is not at all what I need (dictionary-style Json format) but instead a kind of giant messy string .
The pdf sample looks like this:
Analysis
Some text
Reference Date (Big space) 11/17/2021
Reference Price (Big space) USD 745
Client id (Big space) 4572845
I'd like to have something like this at the end:
{Analysis:Some text, Reference Date:11/17/2021, Reference Price:USD 745, Client id:4572845}
Currently the results give all the info mixed up between each others.
Here is my code:
First, I created a "Global" class where I will create the method "Extract_Row_Info_TS that will basically load the first page of the document (called a TS or Termsheet) and extract the text from the PDF and store it into a txt file called "result.txt":
class Global
{
public static void Extract_RowInfo_TS(string doc_Type, string docPath, int? nbrPage = null)
{
switch (doc_Type)
{
case "Pdf":
Spire.Pdf.PdfDocument doc = new Spire.Pdf.PdfDocument();
doc.LoadFromFile(docPath);
StringBuilder buffer = new StringBuilder();
//Extract text from the first page only
Spire.Pdf.PdfPageBase pagefirst = doc.Pages[0];
buffer.Append(pagefirst.ExtractText());
doc.Close();
//save text
String fileName = #"my_disk:\my_path\result.txt";
File.WriteAllText(fileName, buffer.ToString());
//Load File
System.Diagnostics.Process.Start(fileName);
break;
case "Excel":
Spire.Xls.Workbook Wb = new Spire.Xls.Workbook();
break;
case "Word":
Spire.Doc.Document doc_word = new Spire.Doc.Document();
break;
}
}
}
Come back to my main page, I call the above method "Extract_RowInfo_TS" from above Global class and when it created "result.txt" from the pdf infos, I'll try to convert this "result.txt" into a json format:
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void btn_Extract_PDF_Click(object sender, EventArgs e)
{
Global.Extract_RowInfo_TS("Pdf", #"my_disk:\my_path\my_doc.pdf");
Convert_To_Json_Format(#"my_disk:\my_path\result.txt");
}
private void Convert_To_Json_Format(string baseTextFile)
{
string streamText = new StreamReader(baseTextFile).ReadToEnd();
//Serialize Json Data.
string serializeData = Serialize_into_Json(streamText);
string newFile = #"my_disk:\my_path\NEW_text_file_2.txt";
File.WriteAllText(newFile, serializeData);
System.Diagnostics.Process.Start(newFile);
}
private static string Serialize_into_Json(string json)
{
string jsonData = JsonConvert.SerializeObject(json);
return jsonData;
}
}
I'm stuck here trying to create a proper json format file (or anything alike actually, I just want to group info between them, maybe create a table first ? I don't know...) that I can use for sending into my Excel file. Any help would be much appreciated ! I'm using the Free version of Spire Nuget package v4.3.1 that contains Free Spire.PDF, Spire.Xls, Spire.Doc and more of them. But maybe there are some others solutions out there to achieve the goal I'm looking for.
Thanks in advance for helping and have a great day.

C# Read csv from url and save to database

I'm trying to get data from a csv-file from a Webservice.
If i paste the url in my browser, the csv will be downloaded and look like the following example:
"ID","ProductName","Company"
"1","Apples","Alfreds futterkiste"
"2","Oranges","Alfreds futterkiste"
"3","Bananas","Alfreds futterkiste"
"4","Salad","Alfreds futterkiste"
...next 96 rows
However I don't want to download the csv-file first and then extract data from it afterwards.
The webservice uses pagination and returns 100 rows (determined by the &num-parameter with a max of 100). After the first request i can use the &next-parameter to fetch the next 100 rows based on ID. For instance the url
http://testWebservice123.com/Example.csv?auth=abc&number=100&next=100
will get me rows from ID 101 to 200. So if there are a lot of rows i would end up downloading a lot of csv-files and saving them to the harddrive. So instead of downloading the csv-files first and saving them hdd to I want to get data directly from the webservice to be able to write directly to a database without saving the csv-files.
After a bit of search I came up with the following solution
static void Main(string[] args)
{
string startUrl = "http://testWebservice123.com/Example.csv?auth=abc&number=100";
string url = "";
string deltaRequestParameter = "";
string lastLine;
int numberOfLines = 0;
do
{
url = startUrl + deltaRequestParameter;
WebClient myWebClient = new WebClient();
using (Stream myStream = myWebClient.OpenRead(url))
{
using (StreamReader sr = new StreamReader(myStream))
{
numberOfLines = 0;
while (!sr.EndOfStream)
{
var row = sr.ReadLine();
var values = row.Split(',');
//do whatever with the rows by now - i.e. write to console
Console.WriteLine(values[0] + " " + values[1]);
lastLine = values[0].Replace("\"", ""); //last line in the loop - get the last ID.
numberOfLines++;
deltaRequestParameter = "&next=" + lastLine;
}
}
}
} while (numberOfLines == 101); //since the header is returned each time the number of rows will be 101 until we get to the last request
}
but im not sure if this is an "up to date" way of doing this, or if there is a better way (easier/simpler)? In other words i'm insecure about whether using WebClient and StreamReader is the right way to go?
In this thread: how to read a csv file from a url?
WebClient.DownloadString is mentioned as well as WebRequest. But if I want to write to a database without saving csv to hdd which is the best option?
Furhtermore - will the approach I have taken save data to a temporary disk storage behind the scenes or will all data be read into memmory and then disposed when the loop completes?
I have read the following documentation but can't seem to find out what it does behind the scenes:
StreamReader: https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader?view=netframework-4.7.2
Stream: https://learn.microsoft.com/en-us/dotnet/api/system.io.stream?view=netframework-4.7.2
Edit:
I guess I could also be using the following "TextFieldParser"...but my questions is really still the same:
(using the Assembly Microsoft.VisualBasic)
using (Stream myStream = myWebClient.OpenRead(url))
{
using (TextFieldParser parser = new TextFieldParser(myStream))
{
numberOfLines = 0;
parser.TrimWhiteSpace = true; // if you want
parser.Delimiters = new[] { "," };
parser.HasFieldsEnclosedInQuotes = true;
while (!parser.EndOfData)
{
string[] line = parser.ReadFields();
Console.WriteLine(line[0].ToString() + " " + line[1].ToString());
numberOfLines++;
deltaRequestParameter = "&next=" + line[0].ToString();
}
}
}
The HttpClient class on System.Web.Http is available as of .Net 4.5. You have to work with async code, but it's not a bad idea to get into it if you're dealing with the web.
As sample data, I'll use jsonplaceholder's "todo" list. It provides json data, not csv data, but it gives a simple enough structure that can serve our purpose in the example below.
This is the core function, which fetches from jsonplaceholder in a similar way to your "testWebService123" site, although I'm just getting the first 3 todo's, as opposed to testing for when I've hit the last page (you would probably keep your do-while) logic on that one.
async void DownloadPagesAsync() {
for (var i = 1; i < 3; i++) {
var pageToGet = $"https://jsonplaceholder.typicode.com/todos/{i}";
using (var client = new HttpClient())
using (HttpResponseMessage response = await client.GetAsync(pageToGet))
using (HttpContent content = response.Content)
using (var stream = (MemoryStream) await content.ReadAsStreamAsync())
using (var sr = new StreamReader(stream))
while (!sr.EndOfStream) {
var row =
sr.ReadLine()
.Replace(#"""", "")
.Replace(",", "");
if (row.IndexOf(":") == -1)
continue;
var values = row.Split(':');
Console.WriteLine($"{values[0]}, {values[1]}");
}
}
}
This is how you would call the function, such as you would in a Main() method:
Task t = new Task(DownloadPagesAsync);
t.Start();
The new task, here is taking in an "action", or or in other words a function that returns void, as a parameter. Then you start the task. Be careful, it is asynchronous, so any code you have after t.Start() may very well run before your task completes.
As to your question as to whether the stream reads "in memory" or not, running GetType() on "stream" in the code resulted in a "MemoryStream" type, though it seems to only be recognized as a "Stream" object at compile time. A MemoryStream is definately in-memory. I'm not really sure if any of the other kinds of stream objects save temporary files behind the scenes, but I'm leaning towards not.
But looking into the inner workings of a class, though commendable, is not usually required for your anxiety about disposing. For any class, just see if it implements IDisposable. If it does, then put in in a "using" statement, as you have done in your code. When the program terminates, as expected or via error, the program will implement the proper disposures after control has passed out of the "using" block.
HttpClient is in fact the newer approach. From what I understand, it does not replace all of the functionality for WebClient, but is stronger in many respects. See this SO site for more details comparing the two classes.
Also, something to know about WebClient is that it can be simple, but limiting. If you run into issues, you will need to look into the HttpWebRequest class, which is a "lower level" class that gives you greater access to the nuts and bolts of things (such as working with cookies).

Manifest-025 'ChecksumAugmentationNum' Issue on .net

I am submitting the ACA forms(tax year:2016) to the IRS, getting the below error
<ns3:FormBCTransmitterSubmissionDtl xmlns="urn:us:gov:treasury:irs:ext:aca:air:ty16" xmlns:ns2="urn:us:gov:treasury:irs:common" xmlns:ns3="urn:us:gov:treasury:irs:msg:form1094-1095BCtransmittermessage">
<ACATransmitterSubmissionDetail>
<TransmitterErrorDetailGrp>
<ns2:ErrorMessageDetail>
<ns2:ErrorMessageCd>MANIFEST-025</ns2:ErrorMessageCd>
<ns2:ErrorMessageTxt>Manifest 'ChecksumAugmentationNum' must match the IRS-calculated 'ChecksumAugmentationNum' value of the transmission</ns2:ErrorMessageTxt>
</ns2:ErrorMessageDetail>
</TransmitterErrorDetailGrp>
</ACATransmitterSubmissionDetail>
Attached is our MTOM format we are using to send it through A2A.
https://www.dropbox.com/home?preview=samplemtom.txt
I am also tried the ChecksumAugmentationNum value set as Lower case also.
Have you successfully transmitted for Tax Year 2015? I have seen another post related to this issue, but have not run into this issue while sending TY2015 (to AATS or Production) or TY2016 records to AATS. My checksum calculation has not changed, and is very simple.
I have two methods I use to create the checksum: GetChecksum(string) and GetMD5Hash(MD5, string). This approach worked for TY2015, and I expect it to work for TY2016. IIRC, I took this approach directly from MSDN.
The string I pass into the GetChecksum method is the contents of the Form Data Attachment. In my process, I output the XML document into the file system for audit purposes, so the attachment is a physical file for me to use and reference. I read the attachment into a string variable using File.ReadAllText(string path) method.
After generating the checksum my process also will check the checksum against the database and return whether or not that checksum exists (meaning it was used by another form). In the case where this is true, then I update the Contact Suffix, regenerate the Form Data and then regenerate the checksum; this is per the IRS rules for transmission.
This is what is currently working for me, and hopefully this helps you.
Application Callers:
This is what I am doing to call the Checksum calculation functions/routines. It should be noted, I am physically writing each Form Data XML file to the File System, then reading from that.
string AttachmentFileContents = "";
AttachmentFileContents = File.ReadAllText(FormDataFilePath);
string checkSumAugmentationNumber = new Checksum().GetChecksum(AttachmentFileContents);
Checksum Methods:
These are the two methods I use for Checksum Calculation.
public string GetChecksum(string stringToEncrpyt)
{
string hash = "";
using(MD5 md5Hash = MD5.Create())
{
hash = GetMD5Hash(md5Hash, stringToEncrpyt);
}
return hash;
}
private string GetMD5Hash(MD5 md5Hash, string input)
{
byte[] data = md5Hash.ComputeHash(Encoding.UTF8.GetBytes(input));
StringBuilder sb = new StringBuilder();
for (int i = 0; i < data.Length; i++)
{
sb.Append(data[i].ToString("x2"));
}
return sb.ToString();
}

Removing ShellFile Properties

I ran into a problem this week regarding the Windows Shell Property System when applied to TIFF/TIF files. I'm using Microsoft.WindowsAPICodePack 1.1.0.0 to access the property system.
When adding properties, the file gets corrupted because it seems to get stored where the first IFD pointer would be expected. Now I'm not sure it simply inserts itself at the 5th byte, after the file header (0x49 0x49 0x2A 0x00), of if it overwrites any existing data. Additionally, when comparing the hexadecimal of the IDF entries headers, the bytes looks different. Now when I say corrupted, it is only when programmatically opening the file as a byte stream, not knowing if the file has a property system container added to it. It opens fine in Windows Image Preview, but not in the software my clients are using to view TIFF files.
Here's how I'm adding the properties (as an array of key=value strings).
public void SetFileTag(string fileName, string tagName, string tagValue)
{
try
{
using (var shellFile = ShellFile.FromFilePath(fileName))
{
var keywords = shellFile.Properties.System.Keywords.Value;
var keyValue = string.Concat(tagName, "=", tagValue);
var list = keywords == null ? new List<string>() : new List<string>(keywords);
if (list.Contains(keyValue))
{
return;
}
list.Add(keyValue);
using (var writer = shellFile.Properties.GetPropertyWriter())
{
writer.WriteProperty(shellFile.Properties.System.Keywords, list.ToArray(), true);
writer.Close();
}
}
}
finally
{
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
I looked in the code pack for anything available to entirely remove the properties, but I can't find any method to do so, I can only remove the keywords value. Anyone would have an idea on how to perform this? It doesn't have to be .NET code, it can very well be a command-line tool or win32 code.

How to insert and read a pdf file to Sql Server 2005 database using C#

How to insert the pdf file into sqlserver 2005 and read the pdf file from sqlserver?
If you are interested into using database for file storage, look at this 4guysfromrolla article. It's web oriented, but there should be no problem finding what you need.
To put it into the database you must read it into a byte array. Either read it from the file system or use the AspNetFileUploadWebControl.FileBytes property. Create an insert stored procedure and add the byte array as the parameter for your DB column (the column must be of SQL data type "image").
To get it out of the database, use something like:
theRow = getDatarowFromDatabase();
aByteArrayOfTheFile = (byte[])theRow["theSqlImageColumnWithTheFileInIt"];
To allow the user to view or download it use my method SendAsFileToBrowser():
SendAsFileToBrowser(aByteArrayOfTheFile, "application/pdf", "downloaded.pdf");
The source code for the method (with overloads):
// Stream a binary file to the user's web browser so they can open or save it.
public static void SendAsFileToBrowser(byte[] File, string Type, string FileName)
{
string disp = "attachment";
if (string.IsNullOrEmpty(FileName))
{
disp = "inline";
}
// set headers
var r = HttpContext.Current.Response;
r.ContentType = Type; // eg "image/Png"
r.Clear();
r.AddHeader("Content-Type", "binary/octet-stream");
r.AddHeader("Content-Length", File.Length.ToString());
r.AddHeader("Content-Disposition", disp + "; filename=" + FileName + "; size=" + File.Length.ToString());
r.Flush();
// write data to requesting browser
r.BinaryWrite(File);
r.Flush();
}
//overload
public static void SendAsFileToBrowser(byte[] File, string Type)
{
SendAsFileToBrowser(File, Type, "");
}
// overload
public static void SendAsFileToBrowser(System.IO.Stream File, string Type, string FileName)
{
byte[] buffer = new byte[File.Length];
int length = (int)File.Length;
File.Write(buffer, 0, length - 1);
SendAsFileToBrowser(buffer, FileName, Type);
}
Essentially, you are just talking about BLOB storage and retrieval (of image or varbinary(max) data). See this question: Streaming directly to a database

Categories