Am working with Office Interop (can't use OOXML) and want to copy a table from an Excel file into an RTF file.
So first i copy the table in Excel
excelSheet = excelBook.Worksheets[1];
excelBook.CheckCompatibility = false;
excelRange = excelSheet.Range["B12:F21"];
excelRange.Copy();
Then in Word (with the RTF open) i paste it
wordApplication.Selection.Find.Execute(placeholder);
WordRange range = wordApplication.Selection.Range;
if (range.Text.Contains(placeholder))
range.Paste();
Placeholder contains the text i use as code to know where to paste it in
Now in that excel table i have cells formated as currency, and so they contain data in the form 3,56 € but after the paste, what i have in Word (RTF file) is 3.56 $- notice the change from , to . and from € to $
However if i do all this manually (open the Excel file in Excel, select all cells from the table, press ctrl+C, open the RTF in Word, position the cursor and press ctrl+V - i get the correct value (euros).
Any ideas how i do this work programatacly as it does manually?
So i found a little workaround - thanks to #JensKloster for the idea
After pasting the table (as in the question) i load up the rtf as text into a variable and apply the following:
cnt = Regex.Replace(cnt, #"(?<main>\d+)\.(?<decimals>\d{2,3})", "${main},${decimals}").Replace("$", "€");
This seams to do the job (at least for now i found no issues with the current rtf templates in use - although issues could show up easilly in the future
Pending someone smarter giving me a better option, this will become the answer
Related
TL;DR:
How can I capture the paragraph numbering as a 'part' of the text and export it to a DOCX?
Problem
I have a document that's split into sections and sub-sections that reads similarly to a set of state statutes (Statute 208, with subsections Statute 208.1, Statute 208.2, etc.). We created this by modifying the numbering.xml file within the .docx zip.
I want to export a 'sub-section' (208.5) and its text to a separate .docx file. My VSTO add-in exports the text well enough, but the numbering resets to 208.1. This does make some sense as it's now the first paragraph with that <ilvl> in the document.
PDF works okay
Funnily enough, I'm able to call Word.Range's ExportAsFixedFormat function and export this selection to PDF just fine - even retaining the numbering. This led me down a path of trying to 'render' the selection, possibly as it would be printed, in order to throw it into a new .docx file, but I haven't figured that out, either.
What I've tried:
Range.ExportFragment() using both wdFormatStrictOpenXMLDocument and wdFormatDocumentDefaultas the wdSaveType values.
These export but also reset the numbering.
Document.PrintOut() using PrintToFile = true and a valid filename. I realize now that this, quite literally, generates 'printout instructions' and won't inject a new file at path filename with any valid file structure.
Plainly doesn't work. :)
Application.Selection.XML to a variable content and calling Document.Content.InsertXML(content) on a newly added Document object.
Still resets the numbering.
Code Section for Context
using Word = Microsoft.Office.Interop.Word;
Word.Range range = Application.ActiveDocument.Range(startPosition, endPosition);
range.Select();
//export to DOCX?
Application.Selection.Range.ExportFragment(
filename, Word.WdSaveFormat.wdFormatDocumentDefault);
You could use ConvertNumbersToText(wdNumberAllNumbers) before exporting, then _Document.Undo() or close without saving after the export.
There is some good information at this (dated) link that still should work with current Word APIs:
https://forums.windowssecrets.com/showthread.php/27711-Determining-which-ListTemplates-item-is-in-use-(VBA-Word-2000)
Information at that link suggests that you can create a name/handle for your ListTemplate so that you can reference it in code--as long as your statute-style bullets are associated with a named style for the document. The idea is to first name the ListTemplate that's associated with the statute bullet style for the active document and then reference that name when accessing the ListLevels collection.
For instance, you could have code that looks something like this:
ActiveDocument.Styles("StatutesBulletStyle").ListTemplate.Name = "StatuteBulletListTemplate";
After the above assignment, you can refer to the template by name:
ActiveDocument.ListTemplates("StatuteBulletListTemplate").ListLevels(1).StartAt = 5;
Using the above technique no longer requires that you try to figure out what the active template is...
Does that help?
I have an issue I've stuck with for over a year now. I made a Forms application in VB.net which allows the user to type in some information and select items which represent docx-files with tables with special formatting, pictures and other formatting quirks in them.
At the end the software creates a Word document via Office.Interop, using the information the user provided in text fields in the Forms and the items they selected (e.g. it creates a table in Word, listing the user's selections with some extra info) and then appends the content from multiple docx-files depending on the user's selection to the document created via Interop.
The problem is: To achieve this I had to use a pretty dirty method:
I open the respective docx-files, select all content (Range.Wholestory()) and copy it (Range.Copy()). Then I insert this content from the clipboard into my newly created document with the following option:
Selection.PasteAndFormat (wdFormatOriginalFormatting)
This produces a satisfactory result but it feels super dirty since it uses the user's clipboard (which I save at the beginning of the runtime and restore at the end).
I originally tried to use the Selection.InsertFile-Method and tried this again today but it completely screws the formatting.
When the content of the docx is inserted this way it neither has the formatting of the original docx nor the one of the file I created with the program. E.g. the SpaceBefore and SpaceAfter values are wrong, even if I explicitly define them in my created file. Changing the formatting afterwards is no option since the source files contain a lot of special formatting and can change all the time.
Another factor which makes it hard: I cannot save the file before it is presented to the user, using temp folder is not an option in the environment this application is deployed into, so basically everything happens in RAM.
Summary:
Basically what I want is to create the same outcome as with my "Copy and Paste" method utilizing the OriginalFormatting WITHOUT using the clipboard. The problem is, the InsertFile-Method doesn't provide an option for the formatting.
Any idea or help would be greatly appreciated.
Edit:
The FormattedText option as suggested by Rich Michaels produces the same result as the InsertFile-Method. Here is the relevant part of what I did (word is the Microsoft.Office.Interop.Word.Application):
#Opening the source file
Dim doctemp As Microsoft.Office.Interop.Word.Document
doctemp = word.Documents.Open(doctempfilepath)
#Selecting whole document; this is what I did for the "Copy/Paste"-Method, too
doctemp.Range.WholeStory()
Dim insert_range As wordoptions.Range
doc_destination.Activate()
#Jumping to the end and selecting the range
word.Selection.EndKey(Unit:=Microsoft.Office.Interop.Word.WdUnits.wdStory)
insert_range = word.Selection.Range
#Inserting the text
insert_range.FormattedText = doctemp.Range.FormattedText
doctemp.Close(False)
This is the problem:
Use the Range.FormattedText property. It doesn't touch the clipboard and it maintains the source formatting. The process is ...
Set the range in the Source document you want "copied" and set the insertion point in the Destination document and then,
DestinationRange.FormattedText = SourceRange.FormattedText
For a project I'm working on, we need to copy data from Excel sheets into new tables within a Word document and have a strategy that works... in most cases.
First, we do
string file = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString() +
".mht");
object sheetObj = sheetName;
object trueObj = true;
Excel.PublishObject obj = workbook.PublishObjects.Add(Excel.XlSourceType.xlSourceSheet, file,
sheetObj);
obj.Publish(trueObj);
then
Document tempDocument = wordApp.Documents.Open(file);
and read templDocument into the targe Word doc.
...
In a couple of cases, we're seeing problems. (The problems are illustrated in the example files at http://thinkscience.us/office/examples.zip)
1) the big text files show text truncation between Excel and the exported .mht
2) the 'nutritional' files show the addition of several lines of white space between the Excel data and the .mht.
I've tried several variations on the parameters to PublishObjects.Add (using a range rather than an entire sheet). The add method includes an optional XlHtmlType parameter that only works with value XlHtmlType.xlHtmlStatic.
Has anyone used PublishObjects.Add or another strategy to transfer sheets from Excel to Word, preserving as much formatting as possible and not interfering with the system clipboard?
I found an article which might meet your needs, but it's not using Office automation, it's using a library called Spire.Office. Check: How to Maintain Formatting of Cells when Copying Cells from Excel to Word.
I received a requirement to save data in CSV file and send it to customers.
Customers use both Excel and Notepad to view this file.
Data look like:
975567EB, 973456CE, 971343C8
And my data have some number end by "E3" like:
98765E3
so when open in Excel, it will change to:
9.8765E+7
I write a program to change this format to text by adding ="98765E3" to this in C#
while(!sr.EndOfStream) {
var line = sr.ReadLine();
var values = line.Split(',');
values[0] = "=" + "\"" + values[0] + "\""; //Change number format to string
listA.Add(new string[] {values[0], values[1], values[2], values[3]});
}
But with customer, who use Notepad to open CSV file, it will show like:
="98765E3"
How could I save number as text in CSV to open in both Excel and Notepad with the same result? Greatly appreciate any suggestion!
Don't Shoot the messenger.
Your problem is not the way you are exporting (creating...?) data in C#. It is with the way that you are opening the CSV files in Excel.
Excel has numerous options for importing text files that allow for the use of a FieldInfo parameter that specifies the TextFileColumnDataTypes property for each field (aka column) of data being brought in.
If you chose to double-click a CSV file from an Explorer folder window then you will have to put up with what Excel 'best-guesses' are your intended field types for each column. It's not going to stop halfway through an import process to ask your opinion. Some common errors include:
An alphanumeric value with an E will often be interpreted as scientific notation.
Half of the DMY dates will be misinterpreted as the wrong MDY dates (or vise-versa). The other half will become text since Excel cannot process something like 14/08/2015 as MDY.
Any value that starts with a + will produce a #NAME! error because Excel thinks you are attempting to bring in a formula with a named quality.
That's a short list of common errors. There are others. Here are some common solutions.
Use Data ► Get External Data ► From Text. Explicitly specify any ambiguous column data type; e.g. 98765E3 as Text, dates as either DMY, MDY, YMD, etc as the case may be. There is even the option to discard a column of useless data.
Use File ► Open ► Text Files which brings you through the same import wizard as the option above. These actions can be recorded for repeated use using either command.
Use VBA's Workbooks.OpenText method and specify each column's FieldInfo position and data type (the latter with a XlColumnDataType constant).
Read the import file into memory and process it in a memory array before dumping it into the target worksheet.
There are less precise solutions that are still subject to some interpretation from Excel.
Use a Range.PrefixCharacter to force numbers with leading zeroes or alphnumeric values that could conceivably be misinterpreted as scientific notation into the worksheet as text.
Use a text qualifier character; typically ASCII character 034 (e.g. ") to wrap values you want to be interpreted as text.
Copy and paste the entire text file into the target worksheet's column A then use the Range.TextToColumns method (again with FieldInfo options available for each column).
These latter two methods are going to cause some odd values in Notepad but Notepad isn't Excel and cannot process a half-million calculations and other operations in several seconds. If you must mash-up the two programs there will be some compromises.
My suggestion is to leave the values as best as they can be in Notepad and use the facilities and processes readily available in Excel to import the data properly.
I'm able to connect to and read an excel file no problem. But when importing data such as zipcodes that have leading zeros, how do you prevent excel from guessing the datatype and in the process stripping out leading zeros?
I believe you have to set the option in your connect string to force textual import rather than auto-detecting it.
Provider=Microsoft.ACE.OLEDB.12.0;
Data Source=c:\path\to\myfile.xlsx;
Extended Properties=\"Excel 12.0 Xml;IMEX=1\";
Your milage may vary depending on the version you have installed. The IMEX=1 extended property tells Excel to treat intermixed data as text.
Prefix with '
Prefixing the contents of the cell with ' forces Excel to see it as text instead of a number. The ' won't be displayed in Excel.
There is a registry hack that can force Excel to read more than the first 8 rows when reading a column to determine the type:
Change
HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
To be 0 to read all rows, or another number to set it to that number of rows.
Not that this will have a slighht performance hit.
I think the way to do this would be to format the source excel file such that the column is formatted as Text instead of General. Select the entire column and right click and select format cells, select text from the list of options.
I think that would explicitly define that the column content is text and should be treated as such.
Let me know if that works.
Saving the file as a tab delimited text file has also worked well.
---old
Unfortunately, we can't rely on the columns of the excel doc to stay in a particular format as the users will be pasting data into it regularly. I don't want the app to crash if we're relying on a certain datatype for a column.
prefixing with ' would work, is there a reasonable way to do that programatically once the data already exists in the excel doc?
Sending value 00022556 as '=" 00022556"' from Sql server is excellent way to handle leading zero problem
Add "\t" before your string. It'll make the string seem in a new tab.