Corrupted file after `WordprocessingDocument.Open`

Corrupted file after `WordprocessingDocument.Open` - c#

I have problem with this:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true)) { }
Using just above and trying open document in Word showing error message that file is corrupted. It is interesting that for LibreOffice file is OK. I compared xml files (in docx) in WinMarge file before and after using this code and both are identical. Difference is only in size of docx file - why?

OK.. I resolved problem.. it's not nice solution but it's works..
var document = "template.docx";
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
// some editing stuff
wordDoc.Clone("ready.docx");
}
Now template.docx is corrupted, but ready.docx is fine.

I had the same problem, although your solution worked for me
wordDoc.Clone("ready.docx");
I found that in my case it was problem with letter encoding. I had the file generated from abby, image text to docx generation.
In order to check if encoding makes you trouble:
change youreditedword.docx to youreditedword.zip
open .zip, go to word folder, open document.xml
check document.xml' text that appears in word. If it is garbled, then you have encoding problem.
I fixed it this way - removed one phrase in my original unedited word file and writed it down again and saved it. Probably this way the encoding for the file is changed. Then after using openxml library and opening the file did not produce file is corrupted error

Related

OpenXml SpreadsheetDocument SaveAs() produces corrupted document - why?

This dead-simple code creates a file that Excel won't open.
How could this be failing?
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(#"c:\dir\src.xlsx", true))
{
doc.SaveAs(#"c:\dir\saved.xlsx");
}
Notes:
Excel won't open saved.xlsx
src.xlsx is known to exist and be valid (Excel opens it no problem)
saved.xlsx is indeed produced, though it's about 500 bytes smaller than src.xlsx

If you meant this error:
Stop debugging before opening the "saved.xlsx" file
I've checked. It works correctly:
Output file

Editing XPS Print Spool Files(.SPL Extension) in C# (Saving as Zip Problem)

When someone print a document(with XPS printing path) I want to pause print job and edit SPL(which zipped XPS format) file.
If I edit the file with 7zip and save. If I resume the job that document printing without any problem.
If I open the SPL file with System.IO.Compression.ZipFile class or DotNetZip library or SevenZipSharp library and extract a file from SPL file & remove that filefrom SPL file and add that file again to SPL file it generates perfectly fine zip container. I compared the original SPL file and edited SPL file with 7zip, zipinfo, winrar tools and I didn't see any difference. All files in the container are exactly same. I also checked CRCs.
When I'm opening,editing and saving the zipfile I'm not changing anything about compression method, compression level and etc. As I said two zip files looks like exactly same but If I calculate CRCs of original and edited SPL files they are not same.
After I edited(just extracting a page file, deleting it from container and adding it again to container) If I try to resume print job I see an error in event viewer about PrintProcessor and I can't print it.
I can't figure out what's changing after I edit the file(not changing anything in container). I'm going crazy.
Is there any specification about the Zip format of SPL files?

Problem solves If I use "ZipPackage" class.
using (var pack = ZipPackage.Open(xpsFileName, FileMode.Open, FileAccess.ReadWrite))
{
foreach (var part in pack.GetParts()) if (part.Uri.OriginalString.EndsWith(".fpage"))
{
using (var file = part.GetStream(FileMode.Open, FileAccess.ReadWrite))
{
var page = ProcessPage(XElement.Load(file));
file.Position = 0;
page.Save(file);
file.SetLength(file.Position);
}
}
}

iText GetTextFromPage exception with inline image

I have the same problem as was discussed here, which was not solved. My objective is to extract the text from an existing pdf file. I get the error message Could not find image data or EI for a certain pdf, which I cannot share as a sample. It works for other pdfs, with the following code
string fileURI = "C:\\Test\\Sample.pdf";
PdfReader reader = new PdfReader(fileURI);
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string s = PdfTextExtractor.GetTextFromPage(reader, 1, strategy);
Debug.WriteLine(s);
I am using iTextSharp 5.5.0 and tried changing found == 1 to found <= 1 as suggested in other posts. It does not help.
Would it help to remove all images in the pdf? I really just need the text. Which commands from iText could help me with this?

I downloaded the trial version of Acrobat to create a version of the pdf file, that I could share. After opening the file and saving it again as "Optimized PDF" over the Acrobat, the code was working and I could extract the text.
So the solution to the problem is probably opening each file in Acrobat and saving it again with the right settings using the Acrobat reference and then extracting the text.

How to convert docx to html file using open xml with formatting

I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.
I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.
I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.
Here is my existing code:
public void ConvertDocxToHtml(string fileName)
{
byte[] byteArray = File.ReadAllBytes(fileName);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
PageTitle = "My Page Title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"E:\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
So I just want to know if is there any way by which I can retain the formatting in converted HTML file.
I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.

PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See https://openxmldeveloper.org/

Your end result will not look exactly the way your Word Document turns out, but this link might help.

You might want to find an external tool to help you do this, like Aspose Words

You can use OpenXML Viewer extension for Firefox for Converting with formatting.
http://openxmlviewer.codeplex.com
This works for me. Hope this helps.

Creating report with Aspose.Word without losing formatting

I am using Aspose.Words to create reports from a template file (.docx filetype).
After using Aspose.Words to modify the template file and saving it into a new file, the formatting of the template file were lost (such as bold text, comments, etc).
I have tried:
Aspose.Words.Document doc = new Document(inputStream);
var outputStream = new MemoryStream();
doc.Save(outputStream, SaveFormat.docx);
What I did not expect is that outputStream is much less bytes than inputStream although I have yet to make any modification on doc. It may the reason why the report file lose their formatting.
What should I try now?

Ok, the problem is because the current version of Aspose.Words I'm using does not support docx filetype. But it still can read text of a .docx file, and only text(without any associated formatting).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Corrupted file after `WordprocessingDocument.Open` - c#

OK.. I resolved problem.. it's not nice solution but it's works.. var document = "template.docx"; using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true)) { // some editing stuff wordDoc.Clone("ready.docx"); } Now template.docx is corrupted, but ready.docx is fine.

Related

OpenXml SpreadsheetDocument SaveAs() produces corrupted document - why?

Editing XPS Print Spool Files(.SPL Extension) in C# (Saving as Zip Problem)

iText GetTextFromPage exception with inline image

How to convert docx to html file using open xml with formatting

Creating report with Aspose.Word without losing formatting

Categories

Resources