Correct way of parsing XMP XML metadata attached to the end of a PDF file? [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a PDF with some meta data in XMP XML format attached to the end. What is the correct way of parsing and using this meta data?
At the minute i have a working solution using C99, parsing each character in the file, starting at the beginning and using loops until i reach a tag im after and then recording the contents until i reach the closing tag. I can't see this as the best way of doing things.
I'm now rewriting this program using C# + Mono (not .NET) and i wonder if there is a magic framework class for this task instead of just imitating the C99 version? (Also, i can only rely on third party libraries if they don't contain any p/invoke stuff, etc.)
I'm using Mono because i need this app to be cross-platform.

Adobe has published the XMP specification. Give it a try. You need to find out what XMP schema the XML uses and parse it accordingly.

If you can get the complete XML as a string, you can use XmlDocument.Load to get the complete XML in memory for querying.
You can then use XPath with the XmlDocument.SelectNodes method in order to get to your data.

Related

Fast, multi-threaded and free HTML to PDF converter in C# for A4 documents [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 days ago.
Improve this question
I would like to ask for your advice. I need a converter that will create an A4 PDF documents from HTML - I need it to be possible to define the margins of the document and I need the css rendering to work - (css code defined in HTML string). I don't need to save anything to a file - I need a converter to which I send html as a string and it returns a pdf as byte array. My requirement is for it to work as fast as possible - to be able to convert 5000 html strings to pdf documents - each one or two-page long, in a reasonable time. I need the converter to work with a C# ASP .NET Core application.
So far I have tried these converters:
Tuespechkin
Dink
Both work well but very slowly. The convert method takes a very long time. Unfortunately it can't even be called in parallel, even if I create multiple threads the method is always executed serially.
HtmlRenderer.PdfSharp
It works very fast (several times faster than tuespechkin and dink) and is executed in parallel, but the rendered pdf looks bad - some important css styles are ignored and even if I choose the A4 format - some text is beyond the edge of the document.
I also read this entire thread but found nothing helpful to solve my problem: Convert HTML to PDF in .NET
Thanks for every reply

How to convert PDF to WORD in c# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Does the anyone know a .Net component to convert PDF to Word or RTF programatically? I don't want to use OCR and Adobe dependent solutions.
I tried several libraries:
PDF Focus .NET: https://sautinsoft.com/products/pdf-focus/index.php
Aspose.PDF: https://products.aspose.com/pdf/net
Gembox: https://www.gemboxsoftware.com/document
Spire.PDF: https://www.e-iceblue.com/Introduce/pdf-for-net-introduce.html
considered also using Word via COM automation to open and save to pdf programmatically.
Among all of them I liked PDF Focus .NET best of all, and I will explain why:
They try to keep the structure of the document EDITABLE, so that
when I will try to continue editing the text, the paragraph will be
smoothly prolonged. Other libraries are trying to do a
"minimalistic" approach by inserting absolute positioned shapes, so
that if you continue editing the text, it will overlap with the next
piece of text.
They do all their best to recognize tables, so
that tables in the output document will be REAL TABLES, but not a
collection of shapes and texts with absolute positioning (as
produced by other libraries).
A customer of ours is evaluating now different libraries, and I will recommend PDF Focus .NET first of all.
P.S. I AM NOT INVOLVED IN ANY KIND OF RELATIONSHIP WITH THIS SOFTWARE PRODUCER. As a former .NET developer I simply see a high quality components which really work fine.
Use PDF Focus.
Nice and easy.
EDIT: And also
How to convert DOC into other formats using C#
http://dotnetf1.blogspot.com/2008/07/convert-word-doc-into-pdf-using-c-code.html
You need something like GemBox.Document. It's a simple .NET component that enables you to manipulate and convert all kinds of document files.
You should have read this: C# and PDF. There are methods to convert, like beforementioned PDF Focus but be warned: it is buggy, and crashy process. PDF is not intended to be PC-readable.

(C#) Convert PDF to page-by-page JPEGs (needed for a dynamic page flip engine) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need to convert a pdf file to a jpeg page-by-page images.
The reason I need to do this is, I'm creating a dynamic page flip on a asp.net website. I'm currently using MegaZine pageflip engine, wich loads jpeg files and makes a flip book. All I need to do now, is convert the pdf file to a bunch of jpegs, so that when the user clicks "upload pdf and make pageflip", the code behind does all the work.
So is there a (free) library I can use to do this?
I've been googleing for some time, but could not find anything good. Maybe you guys know something.
Thanks for the anwser in advance!!!
Andrej
Have you thought about using MagickNet? It's the .NET interface to ImageMagick, which is the go-to lib for this kind of task.
I would recommend using either PDF2Image, PDF2Cairo or MUPDF to do this. Here's a link to PDF2Image:
http://code.google.com/p/pdf2image

Import / read / load variables from matlab matfile in C# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I've been searching for a while to see if anyone has done any work on reading Matlab mat files in CSharp, and it seems that there is nothing out there.
Has anybody seen any solutions?
I can't simply export my mat files to text and then read them into my C# code, because there are fairly complex structures.
I don't like to inter operate with Matlab and I don't want to use Python (SciPy with loadmat) either.
One option to try is the submission CSMatIO by David Zier on the MathWorks File Exchange. It's an API for .NET 2.0 that will allow you to read level 5 .mat files.
If you have to read newer .mat file formats, you can first load your .mat file into MATLAB and resave it as an older format using the SAVE function's version option.
ILNumerics is able to read and write to/from Matlab mat files, version 6.
Since CSMatIO doesn't seem to be supported, I'd like to share a link to a similar library for reading/writing MATLAB .mat files: MatFileHandler, which targets .NET Standard 2.0.
Wanted to add another alternative. The Accord.Math library (available via Nuget, or here: http://accord-framework.net/ provides a .mat file reader.

XSD for XML documentation generated for C#? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Does anyone know if there is an XSD file somewhere that can be used to validate the XML documentation that gets generated when you compile a C# project with the /doc option?
I want to modify that file manually after it's been generated and I'm looking for an easy way to confirm that I haven't damaged the structure of the file.
Thanks.
I finally broke down and wrote one: XSD for Xml Comments for .NET Documentation
Stumbled across this old question today.. I didn't see this by looking at Microsoft's documentation, nor when looking at other projects that I thought might have an interest in developing such a schema; namely, the sources for the Sandcastle and (long-defunct) NDoc projects.
Short of stepping back to try to define a schema on your own, one thing I could suggest would be to use one of the many tools that will generate an XSD from XML. Microsoft includes XSD.EXE as part of Visual Studio and its SDKs.
You could write up dummy source that exercises each of the XML documentation comment tags, build the XML documentation file for it, then use XSD.EXE. to generate an XSD from that, and use it to validate the XML doc after your processing is done. But I think that could turn out to be less trivial than it sounds.
Also, XML documentation comments refer to types and code elements, and there are many things a schema won't catch; e.g., verifying that the name attribute of a <param> tag still refers to an actual parameter name in your C# source. The compiler verifies such elements at build time. But if you post-process the XML documentation, you would need a custom tool that had a reference to the original C# source or generated assemblies to re-verify such references.

Categories