so I'm attempting to dump some RTF from the clipboard to a file.
Essentially, what's happening is that if the application see's that the user has RTF in the clipboard when they paste, it dumps that RTF to a file that is specified earlier.
The code that I was trying to use to do this is as follows:
private void saveTextLocal(bool plainText = true)
{
object clipboardGetData = Clipboard.GetData(DataFormats.Rtf);
string fileName = filename();
using (FileStream fs = File.Create(fileLoc)) { };
File.WriteAllBytes(fileLoc, ObjectToByteArray(clipboardGetData));
}
private byte[] ObjectToByteArray(Object obj)
{
if (obj == null)
{
return null;
}
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, obj);
return ms.ToArray();
}
This appears to almost work, producing the following information as the file:
ÿÿÿÿ ‰{\rtf1\ansi\deff0\deftab480
{\fonttbl
{\f000 Courier New;}
{\f001 Courier New;}
{\f002 Courier New;}
{\f003 Courier New;}
}
{\colortbl
\red128\green128\blue128;
\red255\green255\blue255;
\red000\green000\blue128;
\red255\green255\blue255;
\red000\green000\blue000;
\red255\green255\blue255;
\red000\green000\blue000;
\red255\green255\blue255;
}
\f0\fs20\cb7\cf6 \highlight5\cf4 Console\highlight3\cf2\b .\highlight5\cf4\b0 WriteLine\highlight3\cf2\b (\highlight1\cf0\b0 "pie!"\highlight3\cf2\b )}
Which does appear to be almost right. Opening the file I'm copying in Notepad++ looks like this:
{\rtf1\ansi\deff0\nouicompat{\fonttbl{\f0\fnil Courier New;}}
{\colortbl ;\red0\green0\blue0;\red255\green255\blue255;\red0\green0\blue128;\red128\green128\blue128;}
{\*\generator Riched20 6.2.9200}\viewkind4\uc1
\pard\cf1\highlight2\f0\fs20\lang2057 Console\cf3\b .\cf1\b0 WriteLine\cf3\b (\cf4\b0 "pie!"\cf3\b )\cf1\b0\par
}
Did I do something obviously wrong, and if so - how would I amend my code to fix it?
Thanks in advance!
The issue was, as madamission quite rightly pointed out, that RTF is ASCII - not binary, and thus running it through a binary converter was wholly the wrong direction.
Instead, I did a cast of the clipboard data object to get it into a string, and I wrote as you would for a normal text file. This produced the file I was expecting. The following is the working code for anyone who might find this:
private void saveTextLocal(bool plainText = true)
{
//First, cast the clipboard contents to string. Remember to specify DataFormat!
string clipboardGetData = (string)Clipboard.GetData(DataFormats.Rtf);
//This is irrelevant to the question, in my method it generates a unique filename
string fileName = filename();
//Start a StreamWriter pointed at the destination file
using (StreamWriter writer = File.CreateText(filePath + ".rtf"))
{
//Write the entirety of the clipboard to that file
writer.Write(clipboardGetData);
};
//Close the StreamReader
}
RTF is only ASCII I think and not binary so I think you should use a TextWriter instead and don't use the BinaryFormatter.
There are some related solutions here: How to create RTF from plain text (or string) in C#?
Related
I have one scenario with class like this.
Class Document
{
public string Name {get;set;}
public byte[] Contents {get;set;}
}
Now I am trying to implement the import export functionality where I keep the document in binary so the document will be in json file with other fields and the document will be something in this format.
UEsDBBQABgAIAAAAIQCitGbRsgEAALEHAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAACAAAAAAA==
Now when I upload this file back, I get this file as a string and I get the same data but when I try to convert this in binary bytes[] the file become corrupt.
How can I achieve this ?
I use something like this to convert
var ss = sr.ReadToEnd();
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(ss);
writer.Flush();
stream.Position = 0;
var bytes = default(byte[]);
bytes = stream.ToArray();
This looks like base 64. Use:
System.Convert.ToBase64String(b)
https://msdn.microsoft.com/en-us/library/dhx0d524%28v=vs.110%29.aspx
And
System.Convert.FromBase64String(s)
https://msdn.microsoft.com/en-us/library/system.convert.frombase64string%28v=vs.110%29.aspx
You need to de-code it from base64, like this:
Assuming you've read the file into ss as a string.
var bytes = Convert.FromBase64String(ss);
There are several things going on here. You need to know the encoding for the default StreamWriter, if it is not specified it defaults to UTF-8 encoding. However, .NET strings are always either UNICODE or UTF-16.
MemoryStream from string - confusion about Encoding to use
I would suggest using System.Convert.ToBase64String(someByteArray) and its counterpart System.Convert.FromBase64String(someString) to handle this for you.
I have big big data in form of bytes around 5GB.
I need to store this data in a file ServerData.xml. This data should be first converted into string and then should be saved to file so that we can perform operation on the file.
I used below code to convert stream of bytes to string and then to save the same in a file.
private const string fileName = "ServerData.xml";
public void ProcessBuffer(byte[] receiveBuffer, int bytes)
{
if (!File.Exists(fileName))
{
using (File.Create(fileName)) { };
}
TextWriter tw = new StreamWriter(fileName, true);
tw.Write(Encoding.UTF8.GetString(receiveBuffer).TrimEnd((Char)0));
tw.Close();
}
Is it the right way ?
or please suggest better way so that there should not be any memory issue if any in future ?
The code in your question can only work if ProcessBuffer is always called with a UTF-8 encoded text that is broken on code point boundaries. That seems pretty unlikely to me, so I would expect that you encounter errors when decoding to text.
However, decoding to text and then writing, is rather pointless and indeed counter-productive. The bytes are already UTF-8 encoded. Write them directly to file as they arrive from the socket. Don't perform any processing of them. When you come to read the XML using XmlReader, the parser will read the encoding as UTF-8 from the document's XML declaration, and be able to decode the rest of the document. I am assuming that the document's XML declaration specifies UTF-8 but that seems highly likely. You should check.
You should get rid of the text writer which is no use to you for writing bytes. Write the bytes directly to a file stream. And try to avoid opening and closing the file repeatedly. That's very inefficient. Open and close the file exactly once.
Why do you need to convert it to a string?
using System.IO;
public static void WriteBytes(byte[] bytes, string filename)
{
using (FileStream fs = new FileStream(filename, FileMode.OpenOrCreate))
using (BinaryWriter writer = new BinaryWriter(fs, Encoding.UTF8))
{
writer.Write(bytes);
}
}
You can simply write these bytes to a file using FileStream:
public void ProcessBuffer(byte[] receivedBuffer, int bytes)
{
using (var fileStream = new FileStream(fileName, FileMode.Create)) // overwrites file
{
fileStream.Write(receivedBuffer, 0, bytes);
}
}
Update: You won't be able to work with such a big XML document if you don't have enough resources. I would suggest reformatting this file. For example, I would parse this XML and insert data into a SQL database. Then, you can easily operate with such amounts of data.
I would prefer that I write all bytes to file. And when reading, convert it to string and then convert to XML using XDocument, XElement etc. By writing bytes in file you will save space, and it is efficient,
Instead of using FileStream, I will prefer File.WriteAllBytes method.
private const string fileName = "ServerData.xml";
public void ProcessBuffer(byte[] receiveBuffer, int bytes)
{
File.WriteAllBytes(filename, bytes);
// And when reading
var bytes = File.ReadAllBytes(filename);
var binaryReader = new BinaryReader(new MemoryStream(bytes));
// Parse strings and make xml,
binaryReader.ReadString();
}
I would like to know the best way to create a simple html file using c#.
Is it using something like System.IO.File.Create?
Something like -
using (FileStream fs = new FileStream("test.htm", FileMode.Create))
{
using (StreamWriter w = new StreamWriter(fs, Encoding.UTF8))
{
w.WriteLine("<H1>Hello</H1>");
}
}
I'll say that File.WriteAllText is a stupid-proof way to write a text file for C# >= 3.5.
File.WriteAllText("myfile.htm", #"<html><body>Hello World</body></html>");
I'll even say that File.WriteAllLines is stupid-proof enough to write bigger html without fighting too much with string composition. But the "good" version is only for C# 4.0 (a little worse version is C# >= 2.0)
List<string> lines = new List<string>();
lines.Add("<html>");
lines.Add("<body>");
lines.Add("Hello World");
lines.Add("</body>");
lines.Add("</html>");
File.WriteAllLines("myfile.htm", lines);
// With C# 3.5
File.WriteAllLines("myfile.htm", lines.ToArray());
I would go with File.Create and then open a StreamWriter to that file if you dont have all the data when you create the file.
This is a example from MS that may help you
class Test
{
public static void Main()
{
string path = #"c:\temp\MyTest.txt";
// Create the file.
using (FileStream fs = File.Create(path, 1024))
{
Byte[] info = new UTF8Encoding(true).GetBytes("This is some text in the file.");
// Add some information to the file.
fs.Write(info, 0, info.Length);
}
// Open the stream and read it back.
using (StreamReader sr = File.OpenText(path))
{
string s = "";
while ((s = sr.ReadLine()) != null)
{
Console.WriteLine(s);
}
}
}
}
Have a look at the HtmlTextWriter class. For an example how to use this class, for example look at http://www.dotnetperls.com/htmltextwriter.
Reading and writing text files and MSDN info. HTML is just a simple text file with *.HTML extension ;)
Simply opening a file for writing (using File.OpenWrite() for example) will create the file if it does not yet exist.
If you have a look at http://msdn.microsoft.com/en-us/library/d62kzs03.aspx you can find an example of creating a file.
But how do you want to create the html file content? If that's just static then you can just write it to a file.. if you have to create the html on the fly you could use an ASPX file with the correct markup and use a Server.Execute to get the HTML as a string.
Yep, System.IO.File.Create(Path) will create your file just fine.
You can also use a filestream and write to it. Seems more handy to write a htm file
I have a stream of bytes which actually (if put right) will form a valid Word file, I need to convert this stream into a Word file without writing it to disk, I take the original stream from SQL Server database table:
ID Name FileData
----------------------------------------
1 Word1 292jf2jf2ofm29fj29fj29fj29f2jf29efj29fj2f9 (actual file data)
the FileData field carries the data.
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = new Microsoft.Office.Interop.Word.Document();
doc = word.Documents.Open(#"C:\SampleText.doc");
doc.Activate();
The above code opens and fill a Word file from File System, I don't want that, I want to define a new Microsoft.Office.Interop.Word.Document, but I want to fill its content manually from byte stream.
After getting the in-memory Word document, I want to do some parsing of keywords.
Any ideas?
Create an in memmory file system, there are drivers for that.
Give word a path to an ftp server path (or something else) which you then use to push the data.
One important thing to note: storing files in a database is generally not good design.
You could look at how Sharepoint solves this. They have created a web interface for documents stored in their database.
Its not that hard to create or embed a webserver in your application that can serve pages to Word. You don't even have to use the standard ports.
There probably isn't any straight-forward way of doing this. I found a couple of solutions searching for it:
Use the OpenOffice SDK to manipulate the document instead of Word
Interop
Write the data to the clipboard, and then from the Clipboard to Word
I don't know if this does it for you, but apparently the API doesn't provide what you're after (unfortunately).
There are really only 2 ways to open a Word document programmatically - as a physical file or as a stream. There's a "package", but that's not really applicable.
The stream method is covered here: https://learn.microsoft.com/en-us/office/open-xml/how-to-open-a-word-processing-document-from-a-stream
But even it relies on there being a physical file in order to form the stream:
string strDoc = #"C:\Users\Public\Public Documents\Word13.docx";
Stream stream = File.Open(strDoc, FileMode.Open);
The best solution I can offer would be to write the file out to a temp location where the service account for the application has permission to write:
string newDocument = #"C:\temp\test.docx";
WriteFile(byteArray, newDocument);
If it didn't have permissions on the "temp" folder in my example, you would simply just add the service account of your application (application pool, if it's a website) to have Full Control of the folder.
You'd use this WriteFile() function:
/// <summary>
/// Write a byte[] to a new file at the location where you choose
/// </summary>
/// <param name="byteArray">byte[] that consists of file data</param>
/// <param name="newDocument">Path to where the new document will be written</param>
public static void WriteFile(byte[] byteArray, string newDocument)
{
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
// Save the file with the new name
File.WriteAllBytes(newDocument, stream.ToArray());
}
}
From there, you can open it with OpenXML and edit the file. There's no way to open a Word document in byte[] form directly into an instance of Word - Interop, OpenXML, or otherwise - because you need a documentPath, or the stream method mentioned earlier that relies on there being a physical file. You can edit the bytes you would get by reading the bytes into a string, and XML afterwards, or just edit the string, directly:
string docText = null;
byte[] byteArray = null;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(documentPath, true))
{
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd(); // <-- converts byte[] stream to string
}
// Play with the XML
XmlDocument xml = new XmlDocument();
xml.LoadXml(docText); // the string contains the XML of the Word document
XmlNodeList nodes = xml.GetElementsByTagName("w:body");
XmlNode chiefBodyNode = nodes[0];
// add paragraphs with AppendChild...
// remove a node by getting a ChildNode and removing it, like this...
XmlNode firstParagraph = chiefBodyNode.ChildNodes[2];
chiefBodyNode.RemoveChild(firstParagraph);
// Or play with the string form
docText = docText.Replace("John","Joe");
// If you manipulated the XML, write it back to the string
//docText = xml.OuterXml; // comment out the line above if XML edits are all you want to do, and uncomment out this line
// Save the file - yes, back to the file system - required
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
// Read it back in as bytes
byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving
Reference:
https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part
I know it's not ideal, but I have searched and not found a way to edit the byte[] directly without a conversion that involves writing out the file, opening it in Word for the edits, then essentially re-uploading it to recover the new bytes. Doing byte[] byteArray = Encoding.UTF8.GetBytes(docText); prior to re-reading the file will corrupt them, as would any other Encoding I tried (UTF7,Default,Unicode, ASCII), as I found when I tried to write them back out using my WriteFile() function, above, in that last line. When not encoded and simply collected using File.ReadAllBytes(), and then writing the bytes back out using WriteFile(), it worked fine.
Update:
It might be possible to manipulate the bytes like this:
//byte[] byteArray = File.ReadAllBytes("Test.docx"); // you might be able to assign your bytes here, instead of from a file?
byte[] byteArray = GetByteArrayFromDatabase(fileId); // function you have for getting the document from the database
using (MemoryStream mem = new MemoryStream())
{
mem.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(mem, true))
{
// do your updates -- see string or XML edits, above
// Once done, you may need to save the changes....
//wordDoc.MainDocumentPart.Document.Save();
}
// But you will still need to save it to the file system here....
// You would update "documentPath" to a new name first...
string documentPath = #"C:\temp\newDoc.docx";
using (FileStream fileStream = new FileStream(documentPath,
System.IO.FileMode.CreateNew))
{
mem.WriteTo(fileStream);
}
}
// And then read the bytes back in, to save it to the database
byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving
Reference:
https://learn.microsoft.com/en-us/previous-versions/office/office-12//ee945362(v=office.12)
But note that even this method will require saving the document, then reading it back in, in order to save it to bytes for the database. It will also fail if the document is in .doc format instead of .docx on that line where the document is being opened.
Instead of that last section for saving the file to the file system, you could just take the memory stream and save that back into bytes once you are outside of the WordprocessingDocument.Open() block, but still inside the using (MemoryStream mem = new MemoryStream() { ... } statement:
// Convert
byteArray = mem.ToArray();
This will have your Word document byte[].
I have a Base64-encoded object with the following header:
application/x-xfdl;content-encoding="asc-gzip"
What is the best way to proceed in decoding the object? Do I need to strip the first line? Also, if I turn it into a byte array (byte[]), how do I un-gzip it?
Thanks!
I think I misspoke initially. By saying the header was
application/x-xfdl;content-encoding="asc-gzip"
I meant this was the first line of the file. So, in order to use the Java or C# libraries to decode the file, does this line need to be stripped?
If so, what would be the simplest way to strip the first line?
To decode the Base64 content in C# you can use the Convert Class static methods.
byte[] bytes = Convert.FromBase64String(base64Data);
You can also use the GZipStream Class to help deal with the GZipped stream.
Another option is SharpZipLib. This will allow you to extract the original data from the compressed data.
I was able to use the following code to convert an .xfdl document into a Java DOM Document.
I used iHarder's Base64 utility to do the Base64 Decode.
private static final String FILE_HEADER_BLOCK =
"application/vnd.xfdl;content-encoding=\"base64-gzip\"";
public static Document OpenXFDL(String inputFile)
throws IOException,
ParserConfigurationException,
SAXException
{
try{
//create file object
File f = new File(inputFile);
if(!f.exists()) {
throw new IOException("Specified File could not be found!");
}
//open file stream from file
FileInputStream fis = new FileInputStream(inputFile);
//Skip past the MIME header
fis.skip(FILE_HEADER_BLOCK.length());
//Decompress from base 64
Base64.InputStream bis = new Base64.InputStream(fis,
Base64.DECODE);
//UnZIP the resulting stream
GZIPInputStream gis = new GZIPInputStream(bis);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(gis);
gis.close();
bis.close();
fis.close();
return doc;
}
catch (ParserConfigurationException pce) {
throw new ParserConfigurationException("Error parsing XFDL from file.");
}
catch (SAXException saxe) {
throw new SAXException("Error parsing XFDL into XML Document.");
}
}
Still working on successfully modifying and re-encoding the document.
Hope this helps.
In Java, you can use the Apache Commons Base64 class
String decodedString = new String(Base64.decodeBase64(encodedBytes));
It sounds like you're dealing with data that is both gzipped and Base 64 encoded. Once you strip off any mime headers, you should convert the Base64 data to a byte array using something like Apache commons codec. You can then wrap the byte[] in a ByteArrayInputStream object and pass that to a GZipInputStream which will let you read the uncompressed data.
For java, have you tried java's built in java.util.zip package? Alternately, Apache Commons has the Commons Compress library to work with zip, tar and other compressed file types. As to decoding Base 64, there are several open source libraries, or you can use Sun's sun.misc.BASE64Decoder class.
Copied from elsewhere, for Base64 I link to commons-codec-1.6.jar:
public static String decode(String input) throws Exception {
byte[] bytes = Base64.decodeBase64(input);
BufferedReader in = new BufferedReader(new InputStreamReader(
new GZIPInputStream(new ByteArrayInputStream(bytes))));
StringBuffer buffer = new StringBuffer();
char[] charBuffer = new char[1024];
while(in.read(charBuffer) != -1) {
buffer.append(charBuffer);
}
return buffer.toString();
}