How to acces the author name and other docx metadata - c#

I want to use C# to get the metadata of a file, for example a docx.
In the screenshot below you see the auteur and other metadata of a file.
How do I write this metadata to the console?

A word file in DOCX is packaged as a zip file. The metadata is in an XML file within that zip file.
As a very simple way to think about it, this is what you would need to do programmatically through C#:
Unzip the DOCX file into it's folder structure.
Open the core.xml file located in the docProps folder of that structure.
Pull out and store the relevant XML elements that you are looking for, such as
title, subject or whatever.
Write those elements with Console.WriteLine().
Image Showing Structure and XML file
Info on Office Open XML format

Related

How to properly programmatically convert a XLSX file to HTML using C#?

At work, we're modifying an XLSX file, and we would like to turn this modified file into an HTML file (to convert it into PDF using Puppeteer#, but it's not the point here).
We know how to get XML files of this XLSX, and I already found XSLCompiledTransform to convert XML files to HTML.
The annoyance here is that, from what I have read for XSLCompiledTransform, to transform XML file to HTML you need one stylesheet + one XML file.
This brings three problems :
It looks like the stylesheet into XLSX for each sheet isn't well formated to use with this XSLCompiledTransform.
The XLSX file contains multiples sheets, so we would have to fuse them in some manner, and we don't know how.
It is not just some random XML files, they're parts of an XLSX file. Thus there are also some XML files in addition to the sheets (like a workbook and other files) and we can't figure how we could generate an HTML file which is precisely like the XLSX file as open using Excel without using these XML files.
These problems could be resumed as: We struggle to find how to generate an HTML file which will look exactly like the original whole XLSX file.
We don't really want to create an HTML file from some XML files, so any means to transform an XLSX to HTML is good.
We also know that there are some tools and libs available to directly do this, but all the ones I've found aren't free, and we would like to avoid to pay for that as it's the first time we need it and maybe the last.
Does anyone know an accurate option to programmatically transform an XLSX file to HTML, keeping every style options and using C#?

Is there any way to store Autotext (GlossaryDocument) in docx instead of dotx

I need make word file with some autotext (generated from database)
Now I programmatically generate word document (docx) and template for it (dotx). Dotx contains list of autotext (in GlossaryDocument) and in docx file I paste relation on it:
documentSettingPart1.AddExternalRelationship("http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate", new Uri($"file:./{Path.GetFileName(dotxTemplate)}", UriKind.Relative) , relationId);
So If user save both files in the same directory and open docx, he can use autotext perfectly. But I looking for a way to realize it in one docx file because it's inconvenient for users to have two files and make sure they are in the same directory.
I tried add GlossaryDocumentPart in docx or change document type (ChangeDocumentType(WordprocessingDocumentType.Document)) but after that I see GlossaryDocument in open xml sdk, but when I open docx-file in Word there are not any autotext from this GlossaryDocument
Is there any way to make docx file that contains autotext in yourself?
A docx file cannot contain AutoText (Building Blocks). It is simply not supported. But why not save the document you're distributing as a template and the user can use it to create a new document whenever it's requiredf? That's what templates are for...
What is possible is to store the Word Open XML that represents the content to be re-used in (a) Custom XML Part(s). You'd need to code some kind of interface to enable the user to retrieve and insert this content. If the code should travel with the document, then as VBA - and it would then need to be a docm rather than docx file.
Given Word 2013 or newer, it's also possible to map/link a content control to a node in a Custom XML Part. But, again, this would require you to develop some kind of interface for the user.
Also possible would be a VSTO or Word JS API solution rather than VBA.

How Can I convert a uploaded pptx/ppt file to XML format in C# using OpenXML?

I have an application which allows uploading a ppt/pptx file. I want to convert the presentation file to equivalent XML format.
A pptx file is essentially a zip file, renamed to pptx. If you rename and extract the content you can find the xml document.
With ppt you have more problem as it is proprietary to Microsoft and may not even be publicly documented. Office automation would most probably work, but rather complicated.

How to programmatically read xml file inside cab file

Is there any way for me to read contents of an xml file in a cab file in C#? I know how to use XDocument to load an xml file and read its contents, but not sure if it is possible to read an xml file that is zipped up in a cab file.
Any ideas?
What you are looking to do is to extract the contents of the CAB file first. You can either write the code to do that yourself or use a 3rd party library.
I have not used this personally, but I have seen it mentioned several times on this site and others: http://www.codeproject.com/KB/files/CABCompressExtract.aspx
To take a stab at writing it yourself, refer to the documentation here: http://msdn.microsoft.com/en-us/library/cc483132%28EXCHG.80%29.aspx

How to convert PDF file to XML

Can any body tell me how to convert pdf file to xml file.
I want to store resume in xml file, i.e. if any user upload the resume in pdf file then it will convert in xml file and in xml file only store some basic detail like name, address, education detail, etc.
please give me answer
you have to read the PDF file using itextsharp
after that filter your required information and then create a xml file using XML

Categories