I am currently building an Excel file by hand using OpenXml. I'm in the process of adding the sheets, however, I have come across an issue. I have a loop that adds the names of each sheet in but once it runs and I try to open the file, I get the following message:
"We found a problem with some content in 'FileName.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, Click Yes."
I think the issue might be due that I am adding in the name of each sheet using a string variable. When I take it out and add something else, it works. Below is my code where I am looping through and adding my sheets.
//Technology Areas
foreach (DataRow dr in techAreaDS.Rows)
{
var data = dr["TechAreaName"].ToString().Split('-');
var techArea = data[2].TrimStart();
var techAreaSheet = new Sheet { Id = workbookPart.GetIdOfPart(worksheetPart),
SheetId = sheetId, Name = techArea };
sheets.Append(techAreaSheet);
sheetId++;
}
I've seen people mention it is an issue with cells having strings that can be converted into strings, but in this case, the string will always be a string. Any help would be appreciated.
EDIT: I've figured out the problem. The issue is the Name property has a Max Length of 31. One of my items has a 42 length, hence the error. I did find a cool set of code to validate my OpenXml. Link.
UPDATE:
Oddly enough, someone thinks this question was about finding some code to help validate what I was doing. It was not... The question is clear: why was I receiving an error when trying to name sheets. I was not asking for validation code, though I found some.
I do ask that if you wish to help, please read the question versus assume what I was asking, and if you don't know what I wish to have answered, ask...
In order to find out the issue(s) causing this error, you need to validate the generated document.
Besides using the built in validation method as described here, which doesn't show you all issues as I found out, I suggest that you download and install Microsoft's Open XML SDK 2.5 for Microsoft Office.
It contains Microsoft's Open XML SDK 2.5 Productivity Tool, which is very helpful here:
Create a copy of the damaged XLSX file, and apply the fixes as Microsoft Excel is suggesting (suppose you have the files FileName_corrupt.xlsx and FileName_fixed.xlsx
Then, run Microsoft's Open XML SDK 2.5 Productivity Tool, open FileName_corrupt.xlsx, select "Compare Files" and specify the 2nd file FileName_fixed.xlsx. This allows you to compare the XML structure of both files.
Let Microsoft's Open XML SDK 2.5 Productivity Tool generate C# code from both files: Open them first, then right-click on the root level and select "Reflect Code". This will create C# code which allows you to generate the same file. Save both C# code versions (i.e. FileName_corrupt.cs and FileName_fixed.cs)
Now you can compare the differences via Visual Studio: Either use
devenv.exe /diff FileName_corrupt.cs FileName_fixed.cs
to compare them, or use the batch file I've created to launch the VS compare - this is a hidden feature in Visual Studio, it allows to compare 2 local files being not part of TFS.
This way you should be able to work out the differences and allow you to fix your code.
NOTE: For a first validation, I do suggest to use the validation code. Only if it still fails, use the steps above. For validation you can use
public static string ValidateOpenXmlDocument(OpenXmlPackage pXmlDoc, bool throwExceptionOnValidationFail=false)
{
using (var docToValidate = pXmlDoc)
{
var validator = new DocumentFormat.OpenXml.Validation.OpenXmlValidator();
var validationErrors = validator.Validate(docToValidate).ToList();
var errors = new System.Text.StringBuilder();
if (validationErrors.Any())
{
var errorMessage = string.Format("ValidateOpenXmlDocument: {0} validation error(s) with document", validationErrors.Count);
errors.AppendLine(errorMessage);
errors.AppendLine();
}
foreach (var error in validationErrors)
{
errors.AppendLine("Description: " + error.Description);
errors.AppendLine("ErrorType: " + error.ErrorType);
errors.AppendLine("Node: " + error.Node);
errors.AppendLine("Path: " + error.Path.XPath);
errors.AppendLine("Part: " + error.Part.Uri);
if (error.RelatedNode != null)
{
errors.AppendLine("Related Node: " + error.RelatedNode);
errors.AppendLine("Related Node Inner Text: " + error.RelatedNode.InnerText);
}
errors.AppendLine();
errors.AppendLine("==============================");
errors.AppendLine();
}
if (validationErrors.Any() && throwExceptionOnValidationFail)
{
throw new Exception(errors.ToString());
}
if (errors.Length > 0)
{
System.Diagnostics.Debug.WriteLine(errors.ToString());
}
return errors.ToString();
}
along with
public static void ValidateExcelDocument(string fileName)
{
using (var xlsx = SpreadsheetDocument.Open(fileName, true))
{
ValidateOpenXmlDocument(xlsx);
}
}
With a slight modification, you can easily use the code above for Microsoft Word validation too:
public static void ValidateWordDocument(string fileName)
{
using (var docx = WordprocessingDocument.Open(fileName, true))
{
ValidateOpenXmlDocument(docx);
}
}
I've figured out the problem. The issue is the Name property has a Max Length of 31 characters. The text I'm trying to use sometimes exceeds that limit (one has 42 characters). I also found a pretty cool set of code to validate my Open Xml to find out what the specific issue is. Link
Related
I'm trying to implement this feature in my application.
Just like in windows, I type into the search box and if the File contents is checked in the settings, than no matter its a text file or pdf/word file, the search returns me the file that contains the string in the search box.
So, I already have come up with a application for files and folder search which works pretty good for the file content search for text files and word file. I'm using interop word for word files.
I know, I can use iTextSharp or some other 3rd party stuff to do this for pdf files. But that doesn't satisfy me. I just wanted to find out how windows does it? Or if anyone else has done it in a different way? I just didn't wanted to use any 3rd party tool but doesn't mean I can't. I just wanted to keep my application light and not dump it with many tools.
As far as I know, it is not possible to search for pdf content with out having 3rd party tool, software or utility installed. So there are pdfgrep for example. But if you manage to any way make a c# program, I would include a third party library to do the job.
I made a solution for some thing similar in this answer Read specific value based on label name from PDF in C#, with a bit of tweak you can have what you are looking for. The only thing is with PdfClown, it is for .net framework, but at the other hand it is open source, free and has no limitation. But if you are looking for .net core you might find some free (with limitation) or paid pdf libraries.
As you request in the comment here is a sample solution to find text in side pdf pages. I have left comments inside the code:
//The found content
private List<string> _contentList;
//Search for content in a given pdf file
public bool SearchPdf(FileInfo fileInfo, string word)
{
_contentList = new List<string>();
ExtractPages(fileInfo.FullName);
var content = string.Join(" ", _contentList);
return content.Contains(word);
}
//Extract content for each page of given pdf file
private void ExtractPages(string filePath)
{
using (var file = new File(filePath))
{
var document = file.Document;
foreach (var page in document.Pages)
{
Extract(new ContentScanner(page));
}
}
}
//Extract content of pdf page and put the found result inside _contentList
private void Extract(ContentScanner level)
{
if (level == null)
return;
while (level.MoveNext())
{
var content = level.Current;
switch (content)
{
case ShowText text:
{
var font = level.State.Font;
_contentList.Add(font.Decode(text.Text));
break;
}
case Text _:
case ContainerObject _:
Extract(level.ChildLevel);
break;
}
}
}
Now lets do quick test, so we assume all your invoice are in c:\temp folder:
static void Main(string[] args)
{
var program = new SearchPdfContent();
DirectoryInfo d = new DirectoryInfo(#"c:\temp");
FileInfo[] Files = d.GetFiles("*.pdf");
var word = "Sushi";
foreach (FileInfo file in Files)
{
var found = program.SearchPdf(file, word);
if (found)
{
Console.WriteLine($"{file.FullName} contains word {word}");
}
}
}
In my case I have for example word sushi inside the invoice:
c:\temp\invoice0001.pdf contains word Sushi
All that said, this is an example of solution. You can take it from here bring it to the next level. Enjoy your day.
I leave some links of what I have searched for:
Searching for files with specific file content
How to search contents of multiple pdf files?
Windows search PDF contents
https://superuser.com/questions/402673/how-to-search-inside-pdfs-with-windows-search
If your application is meant to search for file contents from binaries stored into your DB, the SQL Full-Text search feature can achieve this for you.
You just need to make sure that you have the required IFilters installed and create a full-text index on the table where the binary data is stored.
But if your application must access a folder in real time and search for file contents, you will probably need a third party tool just like #maytham-ɯɐɥʇʎɐɯ said.
I use the EPPlus library to batch edit some existing XLSM files. Inside the files I replace a line of VBA code and that's it. Everything works nice, if I edit the same line in the Excel code editor by hand.
When I open some of the files with Excel 2013 (15.0.4989.1000), the following error message is shown.
We found a problem with some content in 'test.xlsm'. Do you want us to
recover as much as we can? If you trust the source of this workbook,
click Yes.
If I click yes, the repair report shows the following entry. But the message is somewhat too generic to help me further.
Removed Records: Named range from /xl/workbook.xml-Part (Arbeitsmappe)
This is my C# code, which edits the XLSM file. Can I update my code or do I have to update the XLSM-file before editing it?
static void PatchVba(string filePath, string oldCode, string newCode)
{
var wbFileInfo = new FileInfo(filePath);
using (var package = new ExcelPackage(wbFileInfo, false))
{
foreach (var m in package.Workbook.VbaProject.Modules)
{
if (m.Code.Contains(oldCode))
{
m.Code = m.Code.Replace(oldCode, newCode);
Console.WriteLine("VBA Patched in \"{0}\"", filePath);
}
}
try
{
package.SaveAs(wbFileInfo);
}
catch
{
Console.WriteLine("Could not save patched file \"{0}\".", filePath);
}
}
}
I found out what the problem is. In the edited XLSM-file, a range name is used multiple times with overlapping scope. I was too focused on my C# code to find the root cause.
So removing the named ranges solves the issue. But it would still be interesting to know, why I can edit it without problems using Excel, but not by using EPPlus.
I have a VSTO document level customization that performs specific functionality when opened from within our application. Basically, we open normal documents from inside of our application and I copy the content from the normal docx file into the VSTO document file which is stored inside of our database.
var app = new Microsoft.Office.Interop.Word.Application();
var docs = app.Documents;
var vstoDoc = docs.Open(vstoDocPath);
var doc = docs.Open(currentDocPath);
doc.Range().Copy();
vstoDoc.Range().PasteAndFormat(WdRecoveryType.wdFormatOriginalFormatting);
Everything works great, however using the above code leaves out certain formatting related to the document. The code below fixes these issues, but there will most likely be more issues that I come across, as I come across them I could address them one by one ...
for (int i = 0; i < doc.Sections.Count; i++)
{
var footerFont = doc.Sections[i + 1].Footers.GetEnumerator();
var headerFont = doc.Sections[i + 1].Headers.GetEnumerator();
var footNoteFont = doc.Footnotes.GetEnumerator();
foreach (HeaderFooter foot in vstoDoc.Sections[i + 1].Footers)
{
footerFont.MoveNext();
foot.Range.Font.Name = ((HeaderFooter)footerFont.Current).Range.Font.Name;
}
foreach (HeaderFooter head in vstoDoc.Sections[i + 1].Headers)
{
headerFont.MoveNext();
head.Range.Font.Name = ((HeaderFooter)headerFont.Current).Range.Font.Name;
}
foreach (Footnote footNote in vstoDoc.Footnotes)
{
footNoteFont.MoveNext();
footNote.Range.Font.Name = ((Footnote)footNoteFont.Current).Range.Font.Name;
}
}
I need a fool proof safe way of copying the content of one docx file to another docx file while preserving formatting and eliminating the risk of corrupting the document. I've tried to use reflection to set the properties of the two documents to one another, the code does start to look a bit ugly and I always worry that certain properties that I'm setting may have undesirable side effects. I've also tried zipping and unzipping the docx files, editing the xml manually and then rezipping afterwards, this hasn't worked too well, I've ended up corrupting a few of the documents during this process.
If anyone has dealt with a similar issue in the past, please could you point me in the right direction.
Thank you for your time
This code copies and keeps source formatting.
bookmark.Range.Copy();
Document newDocument = WordInstance.Documents.Add();
newDocument.Activate();
newDocument.Application.CommandBars.ExecuteMso("PasteSourceFormatting");
There is one more elegant way to manage it based upon
Globals.ThisAddIn.Application.ActiveDocument.Range().ImportFragment(filePath);
or you can do the following
Globals.ThisAddIn.Application.Selection.Range.ImportFragment(filePath);
in order to obtain current range where filePath is a path to the document you are copping from.
So I have some code that can read the methods out of a .coverage file...
using (CoverageInfo info = CoverageInfo.CreateFromFile(this.myCoverageFile))
{
CoverageDS ds = info.BuildDataSet();
foreach (ICoverageModule coverageModule in info.Modules)
{
CodeModule currentModule = new CodeModule(coverageModule.Name);
byte[] coverageBuffer = coverageModule.GetCoverageBuffer(null);
using (ISymbolReader reader = coverageModule.Symbols.CreateReader())
{
Method currentMethod;
while (reader.GetNextMethod(out currentMethod, coverageBuffer))
{
if (currentMethod != null)
{
currentModule.Methods.Add(currentMethod);
}
}
}
returnModules.Add(currentModule);
}
}
... but I want to be able to read .coverage files that have been exported to xml too. The reason for this is that .coverage files require the source dlls be in the exact location they were when code coverage was measured, which doesn't work for me.
When I try to load a coveragexml file using CreateFromFile(string) I get the following exception.
Microsoft.VisualStudio.Coverage.Analysis.InvalidCoverageFileException
was unhandled Message=Coverage file
"unittestcoverage.coveragexml" is
invalid or corrupt.
The coveragexml file opens in Visual Studio just fine, so I don't believe there's any issue with the file's format.
I know that CoverageDS can import an xml file, but the API is less than intuitive and the only example I can find of its use is...
using(CoverageInfo info = CoverageInfo.CreateFromFile(fileString))
{
CoverageDS data = info.BuildDataSet();
data.ExportXml(xmlFile);
}
...which tells me nothing about how to actually read the coverage data from that file.
Does someone know how to process code coverage data from a .coveragexml file?
Probably the best introduction to manipulating code coverage information programmatically is available here and also in the linked ms_joc blog.
I'm pretty sure you can use 'CreateInfoFromFile' with either the .coverage file or the XML file you exported in the sample above.
UPDATE:
CreateInfoFromFile throws an exception if the coveragexml is passed as the argument. Here is an alternative:
CoverageDS dataSet = new CoverageDS();
dataSet.ImportXml(#"c:\temp\test.coveragexml");
foreach (CoverageDSPriv.ModuleRow module in dataSet.Module)
{
Console.WriteLine(String.Format("{0} Covered: {1} Not Covered: {2}", module.ModuleName, module.LinesCovered, module.LinesNotCovered));
}
Have you tried the CoverageDS.ReadXml(fileName_string) method?
I'm developing a solution that allows people to upload a DOCX file as a template. This template is used for generating Word documents with database info.
What I would like to do is once a template gets uploaded, to check it for errors. (I don't want my parser crashing when a template is used.)
I've seen the question about checking a signature of a Word template, but that isn't enough to validate the integrity of the file. Of course it is possible to try to unzip the file, validate the XML in there, and so on, but this is rather CPU intensive and I'd like a different approach if there is one.
Are there any solutions that are part of the Open XML SDK or other standard approaches to this? Any ideas are apreciated.
in C# off the MSDN site
public static bool IsDocumentValid(WordprocessingDocument mydoc)
{
OpenXmlValidator validator = new OpenXmlValidator();
var errors = validator.Validate(mydoc);
foreach (ValidationErrorInfo error in errors)
Debug.Write(error.Description);
return (errors.Count() == 0);
}