I'm trying out to use the XML - Mind converter https://www.xmlmind.com/foconverter/ to convert some xsl-fo to an rtf and this works well . Just to be clear this is nothing specific to the conveter or its functionality but just a clarification I would like to get which is why I am asking this on stack overflow .
So I have the following code (that was obtained from some documentation)
string foFilePath = #"D:\Temp\test.fo";
string ourPutFilePath = #"D:\Temp\test.rtf";
Converter converter = new Converter();
converter.OutputFormat = OutputFormat.Rtf;
converter.OutputEncoding = "windows-1252";
converter.ImageResolution = 120;
converter.SetInput(foFilePath);
converter.SetOutput(ourPutFilePath);
converter.Convert();
What happens here is quite simple Reads a file from the input path and stores the converted file in the specified output . The question I would like to clarify here is , wheather it would be possible to store this content that is being saved in the file out put path within a variable as well to may be do some processing during the application runtime ?
Maybe I can use the MemoryStream for it ? I'm just not sure how to do it and would really appreciate some help here.
I understand that I can always read it back from the file output path but I am looking for something better than that as saving the file to a certain location may not always be possible in my case
EDIT :- The converter.SetOutput() method allows 3 overloads in the parameters
String fileName
Stream stream
TextWriter writer
Sine you need the output as a string you could try doing something like this
string content;
using (var stream = new MemoryStream())
{
using (var writer = new StreamWriter(stream))
{
Converter converter = new Converter();
converter.OutputFormat = OutputFormat.Rtf;
converter.OutputEncoding = "windows-1252";
converter.ImageResolution = 120;
converter.SetInput(foFilePath);
converter.SetOutput(writer);
converter.Convert();
stream.Position = 0;
content = Encoding.UTF8.GetString(stream.ToArray());
}
}
I'm not sure about the Encoding though, and if the Convert() uses a different encoding this might not work
I call an API to get a PDF file. The API returns it as a string with binary data.
Now I need to save it to a file without any conversion of the data.
How can I do this in C#?
I have been trying
string file = await service.GetDocumentsAsync(document.FileId); // Gets the filedata
byte[] byteArray = file.Select (c => (byte)c).ToArray ();
using (var stream = new FileStream($"c:\\temp\\{document.Id}.pdf", FileMode.Create))
{
stream.Write (byteArray,0,file.Length);
stream.Close ();
}
I do get the PDF, but it only has blank pages.
The beginning of the string when i look at it in the Debugger:
As suggested we had to change what the API returned. We use RestSharp and had to use Response.RawByte iso. Response.
I want to send a url as query string e.g.
localhost/abc.aspx?url=http:/ /www.site.com/report.pdf
and detect if the above URL returns the PDF file. If it will return PDF then it gets saved automatically otherwise it gives error.
There are some pages that uses Handler to fetch the files so in that case also I want to detect and download the same.
localhost/abc.aspx?url=http:/ /www.site.com/page.aspx?fileId=223344
The above may return a pdf file.
What is best way to capture this?
Thanks
You can download a PDF like this
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = req.GetResponse();
//check the filetype returned
string contentType = response.ContentType;
if(contentType!=null)
{
splitString = contentType.Split(';');
fileType = splitString[0];
}
//see if its PDF
if(fileType!=null && fileType=="application/pdf"){
Stream stream = response.GetResponseStream();
//save it
using(FileStream fileStream = File.Create(fileFullPath)){
// Initialize the bytes array with the stream length and then fill it with data
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, bytesInStream.Length);
// Use write method to write to the file specified above
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
}
response.Close();
The fact that it may come from an .aspx handler doesn't actually matter, it's the mime returned in the server response that is used.
If you are getting a generic mime type, like application/octet-stream then you must use a more heuristical approach.
Assuming you cannot simply use the file extension (eg for .aspx), then you can copy the file to a MemoryStream first (see How to get a MemoryStream from a Stream in .NET?). Once you have a memory stream of the file, you can take a 'cheeky' peek at it (I say cheeky because it's not the correct way to parse a PDF file)
I'm not an expert on PDF format, but I believe reading the first 5 chars with an ASCII reader will yield "%PDF-", so you can identify that with
bool isPDF;
using( StreamReader srAsciiFromStream = new StreamReader(memoryStream,
System.Text.Encoding.ASCII)){
isPDF = srAsciiFromStream.ReadLine().StartsWith("%PDF-");
}
//set the memory stream back to the start so you can save the file
memoryStream.Position = 0;
i'm trying to turn a wav file into a string I can send to the server as a part of a json object, so that on the server I can turn that string back into a file.
i have tried to use readAsBinaryString and read as text, can't get past error in reading the string into a byte array.
reader.onloadend = saveMedia;
reader.readAsText(Blob);
//reader.readAsBinaryString(Blob); also tried.
then the callback sends an ajax request with an object holding the string in "reader.result" and on the server i tried things like:
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
byte[] BinaryData = encoding.GetBytes(stringFromRequest);
the answers to this question below seem to be that this should not be done. but i really want to do it this way because of another tool I am using (breeze js). don't want to use a separate post action with a file data type.
releted:
File API - Blob to JSON
Found a way that works:
var reader = new FileReader();
reader.onloadend = afterRead;
reader.readAsBinaryString(blob);
function afterRead() {
// convert binary string to base64 for a safe transfer to the server.
entity.BinaryProp = window.btoa(reader.result);
}
on the server-side:
string BinaryString = (string)entityInfo.UnmappedValuesMap["BinaryProp"];
byte[] BinaryData = Convert.FromBase64String(BinaryString);
You can use fileReader.readAsDataURL(fileObject), this encode blob to base64, which can safely upload to server by API.
var reader = new FileReader();
reader.readAsDataURL(blob);
reader.onloadend = () => {
let thumbnail = reader.result;
console.log(thumbnail)
//send to API
};
The answer above is great, but there is a simpler way.
var reader = new FileReader();
reader.onloadend = afterRead;
reader.readAsDataURL(blob); // Use this function instead
function afterRead() {
entity.BinaryProp = reader.result; //result is already a base64 string!
}
See documentation here: FileReader.readAsDataURL()
I have a stream of bytes which actually (if put right) will form a valid Word file, I need to convert this stream into a Word file without writing it to disk, I take the original stream from SQL Server database table:
ID Name FileData
----------------------------------------
1 Word1 292jf2jf2ofm29fj29fj29fj29f2jf29efj29fj2f9 (actual file data)
the FileData field carries the data.
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = new Microsoft.Office.Interop.Word.Document();
doc = word.Documents.Open(#"C:\SampleText.doc");
doc.Activate();
The above code opens and fill a Word file from File System, I don't want that, I want to define a new Microsoft.Office.Interop.Word.Document, but I want to fill its content manually from byte stream.
After getting the in-memory Word document, I want to do some parsing of keywords.
Any ideas?
Create an in memmory file system, there are drivers for that.
Give word a path to an ftp server path (or something else) which you then use to push the data.
One important thing to note: storing files in a database is generally not good design.
You could look at how Sharepoint solves this. They have created a web interface for documents stored in their database.
Its not that hard to create or embed a webserver in your application that can serve pages to Word. You don't even have to use the standard ports.
There probably isn't any straight-forward way of doing this. I found a couple of solutions searching for it:
Use the OpenOffice SDK to manipulate the document instead of Word
Interop
Write the data to the clipboard, and then from the Clipboard to Word
I don't know if this does it for you, but apparently the API doesn't provide what you're after (unfortunately).
There are really only 2 ways to open a Word document programmatically - as a physical file or as a stream. There's a "package", but that's not really applicable.
The stream method is covered here: https://learn.microsoft.com/en-us/office/open-xml/how-to-open-a-word-processing-document-from-a-stream
But even it relies on there being a physical file in order to form the stream:
string strDoc = #"C:\Users\Public\Public Documents\Word13.docx";
Stream stream = File.Open(strDoc, FileMode.Open);
The best solution I can offer would be to write the file out to a temp location where the service account for the application has permission to write:
string newDocument = #"C:\temp\test.docx";
WriteFile(byteArray, newDocument);
If it didn't have permissions on the "temp" folder in my example, you would simply just add the service account of your application (application pool, if it's a website) to have Full Control of the folder.
You'd use this WriteFile() function:
/// <summary>
/// Write a byte[] to a new file at the location where you choose
/// </summary>
/// <param name="byteArray">byte[] that consists of file data</param>
/// <param name="newDocument">Path to where the new document will be written</param>
public static void WriteFile(byte[] byteArray, string newDocument)
{
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
// Save the file with the new name
File.WriteAllBytes(newDocument, stream.ToArray());
}
}
From there, you can open it with OpenXML and edit the file. There's no way to open a Word document in byte[] form directly into an instance of Word - Interop, OpenXML, or otherwise - because you need a documentPath, or the stream method mentioned earlier that relies on there being a physical file. You can edit the bytes you would get by reading the bytes into a string, and XML afterwards, or just edit the string, directly:
string docText = null;
byte[] byteArray = null;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(documentPath, true))
{
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd(); // <-- converts byte[] stream to string
}
// Play with the XML
XmlDocument xml = new XmlDocument();
xml.LoadXml(docText); // the string contains the XML of the Word document
XmlNodeList nodes = xml.GetElementsByTagName("w:body");
XmlNode chiefBodyNode = nodes[0];
// add paragraphs with AppendChild...
// remove a node by getting a ChildNode and removing it, like this...
XmlNode firstParagraph = chiefBodyNode.ChildNodes[2];
chiefBodyNode.RemoveChild(firstParagraph);
// Or play with the string form
docText = docText.Replace("John","Joe");
// If you manipulated the XML, write it back to the string
//docText = xml.OuterXml; // comment out the line above if XML edits are all you want to do, and uncomment out this line
// Save the file - yes, back to the file system - required
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
// Read it back in as bytes
byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving
Reference:
https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part
I know it's not ideal, but I have searched and not found a way to edit the byte[] directly without a conversion that involves writing out the file, opening it in Word for the edits, then essentially re-uploading it to recover the new bytes. Doing byte[] byteArray = Encoding.UTF8.GetBytes(docText); prior to re-reading the file will corrupt them, as would any other Encoding I tried (UTF7,Default,Unicode, ASCII), as I found when I tried to write them back out using my WriteFile() function, above, in that last line. When not encoded and simply collected using File.ReadAllBytes(), and then writing the bytes back out using WriteFile(), it worked fine.
Update:
It might be possible to manipulate the bytes like this:
//byte[] byteArray = File.ReadAllBytes("Test.docx"); // you might be able to assign your bytes here, instead of from a file?
byte[] byteArray = GetByteArrayFromDatabase(fileId); // function you have for getting the document from the database
using (MemoryStream mem = new MemoryStream())
{
mem.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(mem, true))
{
// do your updates -- see string or XML edits, above
// Once done, you may need to save the changes....
//wordDoc.MainDocumentPart.Document.Save();
}
// But you will still need to save it to the file system here....
// You would update "documentPath" to a new name first...
string documentPath = #"C:\temp\newDoc.docx";
using (FileStream fileStream = new FileStream(documentPath,
System.IO.FileMode.CreateNew))
{
mem.WriteTo(fileStream);
}
}
// And then read the bytes back in, to save it to the database
byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving
Reference:
https://learn.microsoft.com/en-us/previous-versions/office/office-12//ee945362(v=office.12)
But note that even this method will require saving the document, then reading it back in, in order to save it to bytes for the database. It will also fail if the document is in .doc format instead of .docx on that line where the document is being opened.
Instead of that last section for saving the file to the file system, you could just take the memory stream and save that back into bytes once you are outside of the WordprocessingDocument.Open() block, but still inside the using (MemoryStream mem = new MemoryStream() { ... } statement:
// Convert
byteArray = mem.ToArray();
This will have your Word document byte[].