how to convert binary array to word format and display it to textarea using c# - c#

I have binary stored in database. Now I want to convert it to a word doc. I have tried with ASCII encoding but it adds some special characters or symbols in between and doesn't look good.
For example I have resumes in doc and I have saved them in an sql database in binary[] format. Now what I want is to convert that binary to word compatible format and display it in an editor/textarea.

A Word .doc document is not a text file. It contains lots of binary data, the stuff that keeps track of styles, fonts, paragraph formatting, etcetera, etcetera. Which is the junk you see. You cannot realistically read such a file yourself, or for that matter display the document accurately, you have to use Word. You can automate it with the classes in the Microsoft.Office.Interop.Word namespace.
An intermediary solution is to store Word documents in the RTF file format. As long as the formatting doesn't get too fancy, a RichTextBox can display it accurately. Storing it in a dbase column isn't hard either, it is text.

Word document is pretty much Proprietary and closed format, operating with meaning there is no such an interface to pass an array of bites that word understands and get a string out of it.

Related

What all types of file can be read with File class c#?

I have tried reading Text File and XML File with File Class, it works fine.I was wondering if we can read excel or word or other types.
var str = File.ReadAllLines("Test.xlsx");
While debugging ,str shows special characters.
Hope I had made question clear.Kindly Advise
Down votes are welcomed,if accompanied by proper comment to improve :).
Thanks in advance.
XML and Text Files are plain files, where text on screen appear like they are in file. That's why File.ReadAllLines work.
With Excel, it is different. It has encoded logic in file, which when read by a special programs (read MSExcel) decodes it and displays it correctly on screen.
Think of it as a encoded or obfuscated file read by programs specially defined to decrypt them.
To read Excel file in DotNet, you can use them to be transferred into DataSet/DataTable like this Read Excel File in C# (Example)
With File.ReadAllLines you can read text files (and XML is -as we know- as well a text file).
Of course then function reads other kind of data files as well - but you will not get meaningful results. The binary data is interpreted as characters. This will not work for Office files.
The MSDN documentation for File.ReadAllLines() states that:
This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected.
Therefore you can read text files with one of the UTF encodings it supports. To read files that use other encodings (e.g. Windows ANSI, non-Latin text) you should use the overload that takes an Encoding parameter.

how to convert a pdf with the format to xls in C#?

I want to convert this pdf page this is the pdf screenshot to .xls file along with the columns.
You should be able to use a PDF parsing library to extract the text. This could be very easy to impossible. It depends on how the table is represented internally. If it is represented as an image you will also need an OCR library. In the easiest case you could just extract all the text as a string and split rows according to newlines and columns according to tabs or other whitespace.
Try this one and see what happens: http://www.squarepdf.net/parsing-pdf-files-using-itextsharp
Edit: I focused on the reading the PDF part. The writing to Excel is more than covered with a quick google search.

How to convert html string into OOXML (Word 2007) formatted string using openxmlsdk 2.0?

I have a string in HTML format (it may contains HTML tables). Now I want to convert this html string into OOXML representation of Word 2007 based string and store it into my DB table. In future, once I have collected all the html strings then I have to build the Word document from the OOXML string that are stored in the DB.
So, now my problem is I have to convert my html string into equivalent OOXML formatted string for Word 2007.
I already tried with AlternativeFormatImportPart class in openxmlsdk2.0.But it is directly build the Word document for the given html string. I don't want to do it. I just want to convert from HTML string to OOXML formatted string of Word 2007.
If you mean you want to convert HTML code to OpenXML code (elements), you cannot do it directly. Either write your own converter or use one of the available ones (not a lot of amazing ones though that are actively supported). Here is one. http://html2openxml.codeplex.com/

How use clipboard to move data from .NET application to Excel?

Which is Excel's preferred format for receiving data from the clipboard? The data is in a C# / .NET application.
I had been saving to the clipboard in CSV format, but now I want to start giving Excel formatting information (eg. make some cells bold). CSV format is no longer enough.
When I copy from Excel, the clipboard holds 24 formats!
System.Windows.Clipboard.GetDataObject().GetFormats().Dump();
EnhancedMetafile
System.Drawing.Imaging.Metafile
MetaFilePict
Bitmap
System.Drawing.Bitmap
System.Windows.Media.Imaging.BitmapSource
Biff12
Biff8
Biff5
SymbolicLink
DataInterchangeFormat
XML Spreadsheet
HTML Format
Text
UnicodeText
System.String
CSV
Rich Text Format
Embed Source
Object Descriptor
Link Source
Link Source Descriptor
Link
Format129
*
I believe what you're seeing is that Excel prepares the data when you copy to the clipboard in many different formats depending on where you end up pasting it. You probably need to look into the Office XML format for Excel.
See this example xml at Wikipedia for a better idea of the format. While I've never used it before, I'm pretty sure Excel would simply let you paste in the XML directly (if it's the right schema).

How to extract text from Word files using C#?

I am trying to convert a large number (100,000) of word DOC files, these are quite old. From around 1995 to 2000 version of Word, i supposed. I keep going around in circles from what i see here in stack overflow and the MS documentation.
What i want do so is simply read the file, stick the text into a string, parse the string, take out the structure stuff (the file is actually a structured report, looks like Patient: Jon Doe). At that point, I know what i am doing. I can parse the string data, stick it into useful variables, then stick this data into a database. But I do not know how to actually put the text into a string. Any help?
PPS i found this reference which supposedly puts a DOC file into a text file. It's a start, but i'd rather avoid doing a bunch of file manipulations.
If you try to use the Word object model, you must always instantiate a certain version of Word on the client (since running Word on a server is not recommended). Unfortunately, you'll depend of the restriction of Word concerning older files, e.g. in Word 2010 you can open files from Office 95 only in sandbox mode (i.e you're not able to access the file content programmatically). Additionally, you'll have to deal with unknown template content (documents with macros attached, for example).
In your case I'd rather look for a 3p-component which allows to access the content.
I know from document management systems like OpenText eDocs and Autonomy iManage that they use other tools to full-index documents of all types and can present the content in a viewer application. So if you look in this direction, may be you find something useful.
A word file is just a normal file as far as your code goes.
Try this:
using System.IO;
StreamReader streamReader = new StreamReader(filePath);
string text = streamReader.ReadToEnd();
streamReader.Close();

Categories