File.Exists false when created by tesseract

File.Exists false when created by tesseract - c#

I use tesseract to get text from captcha image.
I use this code
Process p = new Process();
p.StartInfo.FileName = Server.MapPath("~/app/tesseract.exe");
p.StartInfo.Arguments = imgSavePath + " " + txtSavePath;
p.Start();
p.WaitForExit();
bool exist = File.Exists(txtSavePath);
The txtSavePath is created in windows explorer, i can open it and can read the text in it. But the exist variable is false. It is so strange.
Can anybody tell me why? How can i use StreamReader to read text in created file?

Tesseract appends a ".txt" extension to the output text file; so in your case, it should be:
bool exist = File.Exists(txtSavePath + ".txt");

Related

Ghostscript is increasing file size after compressing

I use the following method to compress the pdf:
private bool CompressPDF(string Input, string Output, string CompressValue)
{
try
{
Process proc = new Process();
ProcessStartInfo psi = new ProcessStartInfo();
psi.CreateNoWindow = true;
psi.ErrorDialog = false;
psi.UseShellExecute = false;
psi.WindowStyle = ProcessWindowStyle.Hidden;
psi.FileName = string.Concat(Path.GetDirectoryName(Application.ExecutablePath), "\\ghost.exe");
string args = "-sDEVICE=pdfwrite -dCompatibilityLevel=1.4" + " -dPDFSETTINGS=/" + CompressValue + " -dNOPAUSE -dQUIET -dBATCH" + " -sOutputFile=\"" + Output + "\" " + "\"" + Input + "\"";
psi.Arguments = args;
//start the execution
proc.StartInfo = psi;
proc.Start();
proc.WaitForExit();
return true;
}
catch
{
return false;
}
}
I put the pdf settings on "Printer" by default. I cant figure out why the file size of my pdf files increase sometimes.

Ghostscript (more accurately its pdfwrite device) doesn't 'compress' files.
It is possible, by judicious use of settings which will do things like downsample images to trade quality for file size, to get a smaller file produced but there is absolutely no guarantee that this is the case.
Without seeing the input file, there is no possible way to comment on why your file increases in size, but (for example) a PDF 1.5 file can use compressed streams and xref, and the pdfwrite device never uses those, so that could be one reason.
The canned 'PDFSETTINGS' cover a multitude of different controls, you should read those and understand what is actually going on. If your original file happens to already have traded quality for size, then it's entirely likely that the printer settings (which are reasonably conservative) will not actually do anything at all.

How can I get Bitdefender (antivirus) to scan a file and delete it if the file contains a virus?

This is the code I use to write my file to the app_data folder:
var filename = Server.MapPath("~/App_Data") + "/" thefilename;
var ms = new MemoryStream();
file.InputStream.CopyTo(ms);
file.InputStream.Position = 0;
byte[] contents = ms.ToArray();
var fileStream = new System.IO.FileStream(filename, System.IO.FileMode.Create,
System.IO.FileAccess.ReadWrite);
fileStream.Write(contents, 0, contents.Length);
fileStream.Close();
This writes the file fine. However, if there is a virus on it, Bitdefender does not delete this file, unless I go on the IIS and manually try to open/move the file. If I do that, then it is instantly deleted.
If I copy and paste the test virus file into the app_data folder directly then Bitdefender removes it instantly.
I have tried to use various ways to read/move the file with System.IO.File.Move/Open/ReadAllLines. Yet, nothing triggers bit defender to remove the file.
The only thing I got to work was creating a new process to open the file. However, I don't want to be doing that on the server. I am looking for a different solution. This is the code that I've used to open the file, which does cause Bitdefender to scan and remove the infected file:
Process cmd = new Process();
cmd.StartInfo.FileName = filename;
cmd.Start();
A solution with System.IO.File.Open would be best for me in this situation, but I cannot figure out why it isn't working. Alternately, a way to trigger Bitdefender to instantly scan the file would also be a viable solution.

I have solved the issue with the help of #sheavens and following code:
Process cmd = new Process();
ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.WindowStyle = ProcessWindowStyle.Hidden;
startInfo.FileName = #"C:\Program Files\Bitdefender\Endpoint Security\product.console.exe";
var args = String.Format("/c FileScan.OnDemand.RunScanTask custom path=\"{0}\" infectedAction1=delete", filename);
startInfo.Arguments = args;
cmd.StartInfo = startInfo;
cmd.StartInfo.UseShellExecute = false;
cmd.StartInfo.RedirectStandardOutput = true;
cmd.StartInfo.RedirectStandardError = true;
var result = cmd.Start();
This creates a new process and looks up the bitdefender exe, and then runs a command to scan the file at the provided path.

Load standart output and parse it to Image

I have graphviz dot.exe file that I call with parameter -Tpng (output type is png, but I don't care if it is in png, bmp, or any other). I start it in C# code:
ProcessStartInfo psi = new ProcessStartInfo();
psi.FileName = path;
psi.UseShellExecute = false;
psi.Arguments = "-Tpng";
psi.RedirectStandardInput = true;
psi.RedirectStandardOutput = true;
psi.CreateNoWindow = true;
Process p = Process.Start(psi);
Then, I write input
p.StandardInput.WriteLine(input);
input is defined before, it's a string. Input is valid, tested manually.
Then, I need to read output that graphviz prints into standart output and parse it to Image.
I've tried to read memory stream, but I was either unable to read it, or, after reading, the memory stream was locked (threw exception when tried Image.FromStream(myMemoryStream);).
I was able to load output to string
string output = "";
while (true)
{
string newOutput = p.StandardOutput.ReadLine();
output += newOutput;
if (newOutput == String.Empty)
break;
}
I've tried to parse this string as described in this answer, but it threw exception (string is not valid).
How can I get Image from the dot.exe output?

From the comments it seems the program is expecting the StandardInput to be finished before returning the content. Close the StandardInput to achieve it:
p.StandardInput.WriteLine(input);
p.StandardInput.BaseStream.Close();

How to use asp.net and phantomjs to take a screen shot of a page and return it to the client

I want to be able to read a screen shot of a web site, and am attempting to use phantomjs and ASP.NET.
I have tried using page.render which would save the screen shot to a file. It works as a console application, but not when I call it from an asp.net handler. It is probably due to file permissions, since simple applications (like hello.js) work fine.
That is okay, my preference would be not to write to a file, but to deal with the bytes and return an image directly from the handler.
I am a bit lost as to how to do that. I noticed a method called page.renderBase64, but do not know how to use it.
Currently I am using an IHttpHandler.
There is a similar question here, but that person eventualy dropped phantomjs. I like the look of it and want to continue using it if possible.
Running Phantomjs using C# to grab snapshot of webpage

According to your last comment you can do the following in phantom js file:
var base64image = page.renderBase64('PNG');
system.stdout.write(base64image);
in C#:
var startInfo = new ProcessStartInfo {
//some other parameters here
...
FileName = pathToExe,
Arguments = String.Format("{0}",someParameters),
UseShellExecute = false,
CreateNoWindow = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
RedirectStandardInput = true,
WorkingDirectory = pdfToolPath
};
var p = new Process();
p.StartInfo = startInfo;
p.Start();
p.WaitForExit(timeToExit);
//Read the Error:
string error = p.StandardError.ReadToEnd();
//Read the Output:
string output = p.StandardOutput.ReadToEnd();
In your output variable you can read base64 returned from phantomJS and then do what you have planned with it.

Use the wrapper for Phantomjs from here nreco wrapper
You can get js for rastor here : rastorize
And then the following code in C# would do the job.
var phantomJS=new PhantomJS();
phantomJS.Run("rasterize.js", new[] { "http://google.com","ss.pdf" });

This question stemmed from my lack of understanding of what a base64 string actually was.
In the javascript file that phantomjs runs, I can write the base64 image directly to the console like so:
var base64image = page.renderBase64('PNG');
console.log(base64image);
In the c# code that runs phantomjs, I can convert the console output back to bytes and write the image to the response, like so:
var info = new ProcessStartInfo(path, string.Join(" ", args));
info.RedirectStandardInput = true;
info.RedirectStandardOutput = true;
info.UseShellExecute = false;
info.CreateNoWindow = true;
var p = Process.Start(info);
p.Start();
var base64image = p.StandardOutput.ReadToEnd();
var bytes = Convert.FromBase64CharArray(base64image.ToCharArray(), 0, base64image.Length);
p.WaitForExit();
context.Response.OutputStream.Write(bytes, 0, bytes.Length);
context.Response.ContentType = "image/PNG";
This seems to avoid file locking issues I was having.

Using CasperJS coupled with PhantomJS , I've been getting beautiful shots of webpages.
var casper = require('casper').create();
casper.start('http://target.aspx', function() {
this.capture('snapshot.png');
});
casper.run(function() {
this.echo('finished');
});
I highly recommend you check out that tool. I'm still not sure how to do the post-backs though..

Set the 'WorkingDirectory' property of ProcessStartInfo object in order to specify the saving location of the file.

Print existing PDF (or other files) in C#

From an application I'm building I need to print existing PDFs (created by another app).
How can I do this in C# and provide a mechanism so the user can select a different printer or other properties.
I've looked at the PrintDialog but not sure what file it is attempting to print, if any, b/c the output is always a blank page. Maybe I'm just missing something there.
Do I need to use "iTextSharp" (as suggested else where)? That seems odd to me since I can "send the the file to the printer" I just don't have any nice dialog before hand to set the printer etc. and I don't really want to write a printing dialog from the ground up but it seems like a lot of examples I found by searching did just that.
Any advice, examples or sample code would be great!
Also if PDF is the issue the files could be created by the other app in a diff format such as bitmap or png if that makes things easier.

Display a little dialog with a combobox that has its Items set to the string collection returned by PrinterSettings.InstalledPrinters.
If you can make it a requirement that GSView be installed on the machine, you can then silently print the PDF. It's a little slow and roundabout but at least you don't have to pop up Acrobat.
Here's some code I use to print out some PDFs that I get back from a UPS Web service:
private void PrintFormPdfData(byte[] formPdfData)
{
string tempFile;
tempFile = Path.GetTempFileName();
using (FileStream fs = new FileStream(tempFile, FileMode.Create))
{
fs.Write(formPdfData, 0, formPdfData.Length);
fs.Flush();
}
try
{
string gsArguments;
string gsLocation;
ProcessStartInfo gsProcessInfo;
Process gsProcess;
gsArguments = string.Format("-grey -noquery -printer \"HP LaserJet 5M\" \"{0}\"", tempFile);
gsLocation = #"C:\Program Files\Ghostgum\gsview\gsprint.exe";
gsProcessInfo = new ProcessStartInfo();
gsProcessInfo.WindowStyle = ProcessWindowStyle.Hidden;
gsProcessInfo.FileName = gsLocation;
gsProcessInfo.Arguments = gsArguments;
gsProcess = Process.Start(gsProcessInfo);
gsProcess.WaitForExit();
}
finally
{
File.Delete(tempFile);
}
}
As you can see, it takes the PDF data as a byte array, writes it to a temp file, and launches gsprint.exe to print the file silently to the named printer ("HP Laserjet 5M"). You could replace the printer name with whatever the user chose in your dialog box.
Printing a PNG or GIF would be much easier -- just extend the PrintDocument class and use the normal print dialog provided by Windows Forms.
Good luck!

Although this is VB you can easily translate it. By the way Adobe does not pop up, it only prints the pdf and then goes away.
''' <summary>
''' Start Adobe Process to print document
''' </summary>
''' <param name="p"></param>
''' <remarks></remarks>
Private Function printDoc(ByVal p As PrintObj) As PrintObj
Dim myProcess As New Process()
Dim myProcessStartInfo As New ProcessStartInfo(adobePath)
Dim errMsg As String = String.Empty
Dim outFile As String = String.Empty
myProcessStartInfo.UseShellExecute = False
myProcessStartInfo.RedirectStandardOutput = True
myProcessStartInfo.RedirectStandardError = True
Try
If canIprintFile(p.sourceFolder & p.sourceFileName) Then
isAdobeRunning(p)'Make sure Adobe is not running; wait till it's done
Try
myProcessStartInfo.Arguments = " /t " & """" & p.sourceFolder & p.sourceFileName & """" & " " & """" & p.destination & """"
myProcess.StartInfo = myProcessStartInfo
myProcess.Start()
myProcess.CloseMainWindow()
isAdobeRunning(p)
myProcess.Dispose()
Catch ex As Exception
End Try
p.result = "OK"
Else
p.result = "The file that the Document Printer is tryng to print is missing."
sendMailNotification("The file that the Document Printer is tryng to print" & vbCrLf & _
"is missing. The file in question is: " & vbCrLf & _
p.sourceFolder & p.sourceFileName, p)
End If
Catch ex As Exception
p.result = ex.Message
sendMailNotification(ex.Message, p)
Finally
myProcess.Dispose()
End Try
Return p
End Function

You will need Acrobat or some other application that can print the PDF. From there you P/Invoke to ShellExecute to print the document.

You could also use PDFsharp - it's an open source library for creating and manipulating PDFs.
http://www.pdfsharp.net/

I'm doing the same thing for my project and it worked for me
See if it can help you...
Process p = new Process();
p.EnableRaisingEvents = true; //Important line of code
p.StartInfo = new ProcessStartInfo()
{
CreateNoWindow = true,
Verb = "print",
FileName = file,
Arguments = "/d:"+printDialog1.PrinterSettings.PrinterName
};
try
{
p.Start();
}
catch
{
/* your fallback code */
}
You can also play with different options of windows
PRINT command to get desired output...Reference link

After much research and googling about this task Microsoft has released a great KB to print a pdf without any other applications necessary. No need to call adobe or ghostprint. It can print without saving a file to the disk makes life very easy.
http://support2.microsoft.com/?kbid=322091

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

File.Exists false when created by tesseract - c#

Tesseract appends a ".txt" extension to the output text file; so in your case, it should be: bool exist = File.Exists(txtSavePath + ".txt");

Related

Ghostscript is increasing file size after compressing

How can I get Bitdefender (antivirus) to scan a file and delete it if the file contains a virus?

Load standart output and parse it to Image

How to use asp.net and phantomjs to take a screen shot of a page and return it to the client

Print existing PDF (or other files) in C#

Categories

Resources