PDFiumSharp PDF to Image size

PDFiumSharp PDF to Image size - c#

I am using PDFiumSharp to generate JPGs from PDF file. Here is my code:
using (WebClient client = new WebClient())
{
byte[] pdfData = await client.DownloadDataTaskAsync(pdfUrl);
using (var doc = new PdfDocument(pdfData))
{
int i = 0;
foreach (var page in doc.Pages)
{
using (var bitmap = new PDFiumBitmap((int)page.Width, (int)page.Height, true))
using (var stream = new MemoryStream())
{
page.Render(bitmap);
bitmap.Save(stream);
...
i++;
}
}
}
}
The codes work very well, images are generated accurately. However, each JPG is about 2mb. With multi-page PDF, the overall image size adds up quickly. Is there any way to reduce the JPG file size? I only need the JPG for preview purposes, not for printing. So lower resolution or quality is fine.

When you call bitmap.Save(...), the resulting byte[] that gets put into the MemoryStream stream represents a BMP. You should convert it into JPG yourself.
public static byte[] Render(PdfDocument pdfDocument, int pageNumber, (int width, int height) outputSize)
{
var page = pdfDocument.Pages[pageNumber];
using var thumb = new PDFiumBitmap((int)page.Width, (int)page.Height, false);
page.Render(thumb);
using MemoryStream memoryStreamBMP = new();
thumb.Save(memoryStreamBMP);
using Image imageBmp = Image.FromStream(memoryStreamBMP);
using MemoryStream memoryStreamJPG = new();
imageBmp.Save(memoryStreamJPG, ImageFormat.Jpeg);
return memoryStreamJPG.ToArray();
}

Related

About analysing the photo with tesseract

I wrote this code for analyzing the numbers included in picture. It does not give any error while starting but it can not read the numbers. When I start program, it shows an empty MessageBox.
I want to read pictures like this:
The code:
private string FotoAnaliz()
{
FileStream fs = new FileStream("D:\\program_goruntusu.jpg", FileMode.OpenOrCreate);
//string fotopath = #"D:\\program_goruntusu.jpg";
Bitmap images = new Bitmap(fs);
using (var engine = new TesseractEngine(#"./tessdata", "eng"))
{
engine.SetVariable("tessedit_char_whitelist", "0123456789");
// have to load Pix via a bitmap since Pix doesn't support loading a stream.
using (var image = new Bitmap(images))
{
using (var pix = PixConverter.ToPix(image))
{
using (var page = engine.Process(pix))
{
sayı = page.GetText();
MessageBox.Show(sayı);
fs.Close();
}
}
}
}
return sayı;
}

Try PSM 10: Treat the image as a single character.
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc

Resize Webp images

I had Converted Jpg to Webp But I want to resize the image.
using (Bitmap bitmap = new Bitmap(UploadName +jpgFileName))
{
using (var saveImageStream = System.IO.File.Open(webpFileName, FileMode.Create))
{
var encoder = new SimpleEncoder();
encoder.Encode(bitmap, saveImageStream, 90);
}
}

I think you can first resize in jpg then you convert from jpg to webp.
those methods i use to resize image:
You can recive a IFormFile jpgImage and convert to byte[] using method below:
public byte[] GetBytes(IFormFile formFile)
{
using var fileStream = formFile.OpenReadStream();
var bytes = new byte[formFile.Length];
fileStream.Read(bytes, 0, (int)formFile.Length);
return bytes;
}
// Methods to resize the jpg image in byte[]
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.PixelFormats;
using SixLabors.ImageSharp.Processing;
...
public byte[] ResizeImage(byte[] img, int width, int height)
{
using var image = SixLabors.ImageSharp.Image.Load(img);
var options = new ResizeOptions
{
Size = new Size(width, height),
Mode = ResizeMode.Max
};
return Resize(image, options);
}
private byte[] Resize(Image<Rgba32> image, ResizeOptions resizeOptions)
{
byte[] ret;
image.Mutate(i => i.Resize(resizeOptions));
using MemoryStream ms = new MemoryStream();
image.SaveAsJpeg(ms);
ret = ms.ToArray();
ms.Close();
return ret;
}
After resize the jpg image in byte[] you convert to webp

Save a zip file to memory and unzip file from stream and get content

I am currently working on integrating Amazon Prime on our system and being stuck at getting the label back as ZPL format.
Basically, Amazon returns a base64 string, we will need to convert that string to a byte array, then save that array as a *.gzip file. From that gzip file, we can extract the content and get the zpl label content.
My question is, how we can do all of above without storing any temp files to system. I have researched some solutions but none is working for me.
My current code as below:
var str = "base64string";
var label = Convert.FromBase64String(str);
using (var memoryStream = new MemoryStream())
{
using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Create, true))
{
var demoFile = archive.CreateEntry("label.zip");
var entryStream = demoFile.Open();
using (var bw = new BinaryWriter(entryStream))
{
bw.Write(label);
}
var data = new MemoryStream();
using (var zip = ZipFile.Read(entryStream))
{
zip["label"].Extract(data);
}
data.Seek(0, SeekOrigin.Begin);
entryStream.Close();
}
using (var fileStream = new FileStream(#"D:\test.zip", FileMode.Create))
{
memoryStream.Seek(0, SeekOrigin.Begin);
memoryStream.CopyTo(fileStream);
}
}
If I save the file as test.zip, I can successfully get the label back. But if I try to extract it directly to another stream, I get an error
A stream from ZipArchiveEntry has been disposed

I've done something similar, taking PNG label data from a zipped web response. This is how I went about that
using (WebClient webClient = new WebClient())
{
// Download. Expect this to be a zip file
byte[] data = webClient.DownloadData(urlString);
MemoryStream memoryStream = new MemoryStream(data);
ZipArchive zipArchive = new ZipArchive(memoryStream);
foreach (var zipEntry in zipArchive.Entries)
{
// Can check file name here and ignore anything in zip we're not expecting
if (!zipEntry.Name.EndsWith(".png")) continue;
// Open zip entry as stream
Stream extractedFile = zipEntry.Open();
// Convert stream to memory stream
MemoryStream extractedMemoryStream = new MemoryStream();
extractedFile.CopyTo(extractedMemoryStream);
// At this point the extractedMemoryStream is a sequence of bytes containing image data.
// In this test project I'm pushing that into a bitmap image, just to see something on screen, but could as easily be written to a file or passed for storage to sql or whatever.
BitmapDecoder decoder = PngBitmapDecoder.Create(extractedMemoryStream, BitmapCreateOptions.None, BitmapCacheOption.OnLoad);
BitmapFrame frame = decoder.Frames.First();
frame.Freeze();
this.LabelImage.Source = frame;
}
}

I was overthinking it. I finally found a simple way to do it. We just need to convert that base64 string to bytes array and use GzipStream to directly decompress it. I leave the solution here in case someone needs it. Thanks!
var label = Convert.FromBase64String(str);
using (var compressedStream = new MemoryStream(label))
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var resultStream = new MemoryStream())
{
zipStream.CopyTo(resultStream);
return resultStream.ToArray();
}

LibTiff.Net convert Tiff object to byte array and back

Hi I can’t find sample for convert MULTI PAGE tiff image to byte array.
For convert byte array to Tiff I use this method
public static Tiff CreateTiffFromBytes(byte[] bytes)
{
using (var ms = new MemoryStream(bytes))
{
Tiff tiff = Tiff.ClientOpen("in-memory", "r", ms, new TiffStream());
return tiff;
}
}
EDITED:
This method convert TIFF image with more pages to byte Array. I think in this method will be root of problem.
//imageIn is tif image with 12 pages
public static byte[] ImageToByteArray(System.Drawing.Image imageIn)
{
using (var ms = new MemoryStream())
{
imageIn.Save(ms, System.Drawing.Imaging.ImageFormat.Tiff);
return ms.ToArray();
}
}
public static List<System.Drawing.Image> GetAllPages(System.Drawing.Image multiTiff)
{
var images = new List<System.Drawing.Image>();
var bitmap = (Bitmap)multiTiff;
int count = bitmap.GetFrameCount(FrameDimension.Page);
for (int idx = 0; idx < count; idx++)
{
bitmap.SelectActiveFrame(FrameDimension.Page, idx);
using (var byteStream = new MemoryStream())
{
bitmap.Save(byteStream, ImageFormat.Tiff);
images.Add(System.Drawing.Image.FromStream(byteStream));
}
}
return images;
}
After conversion to byte array I lose pages.
Image from byte array has only one page.
Image src = Image.FromFile(source);
//imagesInSource.Count (pages) is 12
List<Image> imagesInSource = GetAllPages(src);
byte[] imageData = ImageToByteArray(src);
Image des = ImageConvert.ByteArrayToImage(imageData);
//imagesInSource.Count (pages) is 1
List<Image> imagesInDes = GetAllPages(des);

I am not sure why you can't send TIFF file to the service? The file is just bytes, after all.
And your code in the first snippet is incorrect because you dispose memory stream that is passed to Tiff object. You shouldn't do that. The Tiff object will dispose the stream itself.
EDIT:
In the third snippet you create images for each page of the System.Drawing.Image but the you convert only first produced image to byte array. You should use something like
List<byte[]> imagesBytes = new List<byte[]>();
foreach (Image img in imagesInSource)
{
byte[] imageData = ImageToByteArray(src);
imageBytes.Add(imageData);
}
Then you should send imagesBytes to your server and create several TIFF images from that.
Anyway, it seems like you should think more about what are you really trying to do. Because for now it unclear to me what all these conversions are for.

C# WPF convert BitmapImage pasted in richtextbox to binary

I've got a richtextbox, that I plan on saving to a database, which can be loaded back into the same richtextbox. I've got it working so that I can save the flowdocument as DataFormats.XamlPackage, which saves the images, but the issue is that the text isn't searchable. With DataFormats.Xaml, I've got the text of course, but no images. The images will be pasted in by the end user, not images included with the application.
I tried using XamlWriter to get the text into XML, and then grab the images from the document separately and insert them as binary into the XML, but I can't seem to find a way to get the images to binary...
Does anyone have ideas on how to get the images into binary, separate from the text?
Thanks in advance!
GetImageByteArray() is where the issue is.
Code:
private void SaveXML()
{
TextRange documentTextRange = new TextRange(richTextBox.Document.ContentStart, richTextBox.Document.ContentEnd);
FlowDocument flowDocument = richTextBox.Document;
using (StringWriter stringwriter = new StringWriter())
{
using (System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(stringwriter))
{
XamlWriter.Save(flowDocument, writer );
}
testRTF t = new testRTF();
t.RtfText = new byte[0];
t.RtfXML = GetImagesXML(flowDocument);
t.RtfFullText = stringwriter.ToString();
//save t to database
}
richTextBox.Document.Blocks.Clear();
}
private string GetImagesXML(FlowDocument flowDocument)
{
using (StringWriter stringwriter = new StringWriter())
{
using (System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(stringwriter))
{
Type inlineType;
InlineUIContainer uic;
System.Windows.Controls.Image replacementImage;
byte[] bytes;
System.Text.ASCIIEncoding enc;
//loop through replacing images in the flowdoc with the byte versions
foreach (Block b in flowDocument.Blocks)
{
foreach (Inline i in ((Paragraph)b).Inlines)
{
inlineType = i.GetType();
if (inlineType == typeof(Run))
{
//The inline is TEXT!!!
}
else if (inlineType == typeof(InlineUIContainer))
{
//The inline has an object, likely an IMAGE!!!
uic = ((InlineUIContainer)i);
//if it is an image
if (uic.Child.GetType() == typeof(System.Windows.Controls.Image))
{
//grab the image
replacementImage = (System.Windows.Controls.Image)uic.Child;
//get its byte array
bytes = GetImageByteArray((BitmapImage)replacementImage.Source);
//write the element
writer.WriteStartElement("Image");
//put the bytes into the tag
enc = new System.Text.ASCIIEncoding();
writer.WriteString(enc.GetString(bytes));
//close the element
writer.WriteEndElement();
}
}
}
}
}
return stringwriter.ToString();
}
}
//This function is where the problem is, i need a way to get the byte array
private byte[] GetImageByteArray(BitmapImage bi)
{
byte[] result = new byte[0];
using (MemoryStream ms = new MemoryStream())
{
XamlWriter.Save(bi, ms);
//result = new byte[ms.Length];
result = ms.ToArray();
}
return result;
}
UPDATE
I think I may have finally found a solution, which I will post below. It uses BmpBitmapEncoder and BmpBitmapDecoder. This allows me to get binary from the bitmap image, store it to the database, and load it back up and display it right back into the FlowDocument. Initial tests have proven successful. For testing purposes I'm bypassing my database step and basically duplicating the image by creating binary, then taking the binary and turning it into a new image and adding it to the FlowDocument. The only issue is that when I try and take the modified FlowDocument and use the XamlWriter.Save function, it errors on the newly created Image with "Cannot serialize a non-public type 'System.Windows.Media.Imaging.BitmapFrameDecode". This will take some further investigation. I'll have to leave it alone for now though.
private void SaveXML()
{
TextRange documentTextRange = new TextRange(richTextBox.Document.ContentStart, richTextBox.Document.ContentEnd);
FlowDocument flowDocument = richTextBox.Document;
string s = GetImagesXML(flowDocument);//temp
LoadImagesIntoXML(s);
using (StringWriter stringwriter = new StringWriter())
{
using (System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(stringwriter))
{
XamlWriter.Save(flowDocument, writer );//Throws error here
}
}
}
private string GetImagesXML(FlowDocument flowDocument)
{
string s= "";
using (StringWriter stringwriter = new StringWriter())
{
Type inlineType;
InlineUIContainer uic;
System.Windows.Controls.Image replacementImage;
byte[] bytes;
BitmapImage bi;
//loop through replacing images in the flowdoc with the byte versions
foreach (Block b in flowDocument.Blocks)
{
foreach (Inline i in ((Paragraph)b).Inlines)
{
inlineType = i.GetType();
if (inlineType == typeof(Run))
{
//The inline is TEXT!!!
}
else if (inlineType == typeof(InlineUIContainer))
{
//The inline has an object, likely an IMAGE!!!
uic = ((InlineUIContainer)i);
//if it is an image
if (uic.Child.GetType() == typeof(System.Windows.Controls.Image))
{
//grab the image
replacementImage = (System.Windows.Controls.Image)uic.Child;
bi = (BitmapImage)replacementImage.Source;
//get its byte array
bytes = GetImageByteArray(bi);
s = Convert.ToBase64String(bytes);//temp
}
}
}
}
return s;
}
}
private byte[] GetImageByteArray(BitmapImage src)
{
MemoryStream stream = new MemoryStream();
BmpBitmapEncoder encoder = new BmpBitmapEncoder();
encoder.Frames.Add(BitmapFrame.Create((BitmapSource)src));
encoder.Save(stream);
stream.Flush();
return stream.ToArray();
}
private void LoadImagesIntoXML(string xml)
{
byte[] imageArr = Convert.FromBase64String(xml);
System.Windows.Controls.Image img = new System.Windows.Controls.Image()
MemoryStream stream = new MemoryStream(imageArr);
BmpBitmapDecoder decoder = new BmpBitmapDecoder(stream, BitmapCreateOptions.None, BitmapCacheOption.Default);
img.Source = decoder.Frames[0];
img.Stretch = Stretch.None;
Paragraph p = new Paragraph();
p.Inlines.Add(img);
richTextBox.Document.Blocks.Add(p);
}

Good news. I had to work on something else for a while, but this allowed me to come back with a fresh pair of eyes. I quickly realized that I could just combine what I knew was working. I doubt this solution will win any awards, but it works. I know that I can wrap a FlowDocument up as text using the XamlReader, keeping the image elements but losing image data. I also knew that I can turn a FlowDocument into binary using XamlFormat. So I had the idea of taking the FlowDocument, and using a function I already wrote to iterate through it to find the images, I take each image, basically clone it and put the clone into a new FlowDocument. I take that new FlowDocument that now contains the single image, turn it into binary, and then take the resulting binary, turn it into base64 string and stick it into the tag property of the image in the original FlowDocument. This keeps image data in the original FlowDocument as text. This way I can pass the FlowDocument with image data (which I call SUBString Format) into the XamlReader to get searchable text. When it comes out of the database, I pull the FlowDocument out of the Xaml as normal, but then iterate through each image, extracting the data from the tag property using XamlFormat, and then creating another clone image to provide the Source property for my actual image. I have provided the steps to get to SUBString format below.
/// <summary>
/// Returns a FlowDocument in SearchableText UI Binary (SUB)String format.
/// </summary>
/// <param name="flowDocument">The FlowDocument containing images/UI formats to be converted</param>
/// <returns>Returns a string representation of the FlowDocument with images in base64 string in image tag property</returns>
private string ConvertFlowDocumentToSUBStringFormat(FlowDocument flowDocument)
{
//take the flow document and change all of its images into a base64 string
FlowDocument fd = TransformImagesTo64(flowDocument);
//apply the XamlWriter to the newly transformed flowdocument
using (StringWriter stringwriter = new StringWriter())
{
using (System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(stringwriter))
{
XamlWriter.Save(flowDocument, writer);
}
return stringwriter.ToString();
}
}
/// <summary>
/// Returns a FlowDocument with images in base64 stored in their own tag property
/// </summary>
/// <param name="flowDocument">The FlowDocument containing images/UI formats to be converted</param>
/// <returns>Returns a FlowDocument with images in base 64 string in image tag property</returns>
private FlowDocument TransformImagesTo64(FlowDocument flowDocument)
{
FlowDocument img_flowDocument;
Paragraph img_paragraph;
InlineUIContainer img_inline;
System.Windows.Controls.Image newImage;
Type inlineType;
InlineUIContainer uic;
System.Windows.Controls.Image replacementImage;
//loop through replacing images in the flowdoc with the base64 versions
foreach (Block b in flowDocument.Blocks)
{
//loop through inlines looking for images
foreach (Inline i in ((Paragraph)b).Inlines)
{
inlineType = i.GetType();
/*if (inlineType == typeof(Run))
{
//The inline is TEXT!!! $$$$$ Kept in case needed $$$$$
}
else */if (inlineType == typeof(InlineUIContainer))
{
//The inline has an object, likely an IMAGE!!!
uic = ((InlineUIContainer)i);
//if it is an image
if (uic.Child.GetType() == typeof(System.Windows.Controls.Image))
{
//grab the image
replacementImage = (System.Windows.Controls.Image)uic.Child;
//create a new image to be used to get base64
newImage = new System.Windows.Controls.Image();
//clone the image from the image in the flowdocument
newImage.Source = replacementImage.Source;
//create necessary objects to obtain a flowdocument in XamlFormat to get base 64 from
img_inline = new InlineUIContainer(newImage);
img_paragraph = new Paragraph(img_inline);
img_flowDocument = new FlowDocument(img_paragraph);
//Get the base 64 version of the XamlFormat binary
replacementImage.Tag = TransformImageTo64String(img_flowDocument);
}
}
}
}
return flowDocument;
}
/// <summary>
/// Takes a FlowDocument containing a SINGLE Image, and converts to base 64 using XamlFormat
/// </summary>
/// <param name="flowDocument">The FlowDocument containing a SINGLE Image</param>
/// <returns>Returns base 64 representation of image</returns>
private string TransformImageTo64String(FlowDocument flowDocument)
{
TextRange documentTextRange = new TextRange(flowDocument.ContentStart, flowDocument.ContentEnd);
using (MemoryStream ms = new MemoryStream())
{
documentTextRange.Save(ms, DataFormats.XamlPackage);
ms.Position = 0;
return Convert.ToBase64String(ms.ToArray());
}
}

Save your image to a MemoryStream and write that stream to your XML file.
The memory stream will convert it to an Byte[].

Here is the sample code for both of my suggestions that i have made already, ill have to look into the payload issue if my examples dont work...
// get raw bytes from BitmapImage using BaseUri and SourceUri
private byte[] GetImageByteArray(BitmapImage bi)
{
byte[] result = new byte[0];
string strImagePath = Path.Combine(Path.GetDirectoryName(bi.BaseUri.OriginalString), bi.UriSource.OriginalString);
byte[] fileBuffer;
using (FileStream fileStream = new FileStream(strImagePath, FileMode.Open))
{
fileBuffer = new byte[fileStream.Length];
fileStream.Write(fileBuffer, 0, (int)fileStream.Length);
}
using (MemoryStream ms = new MemoryStream(fileBuffer))
{
XamlWriter.Save(bi, ms);
//result = new byte[ms.Length];
result = ms.ToArray();
}
return result;
}
// get raw bytes from BitmapImage using BitmapImage.CopyPixels
private byte[] GetImageByteArray(BitmapSource bi)
{
int rawStride = (bi.PixelWidth * bi.Format.BitsPerPixel + 7) / 8;
byte[] result = new byte[rawStride * bi.PixelHeight];
bi.CopyPixels(result, rawStride, 0);
return result;
}
private BitmapSource GetImageFromByteArray(byte[] pixelInfo, int height, int width)
{
PixelFormat pf = PixelFormats.Bgr32;
int stride = (width * pf.BitsPerPixel + 7) / 8;
BitmapSource image = BitmapSource.Create(width, height, 96, 96, pf, null, pixelInfo, stride);
return image;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

PDFiumSharp PDF to Image size - c#

Related

About analysing the photo with tesseract

Resize Webp images

Save a zip file to memory and unzip file from stream and get content

LibTiff.Net convert Tiff object to byte array and back

C# WPF convert BitmapImage pasted in richtextbox to binary

Categories

Resources