While extracting color from PDF using iTextSharp I get this error :
int r = renderInfo.GetColorNonStroke().R;
int g = renderInfo.GetColorNonStroke().G;
int b = renderInfo.GetColorNonStroke().B;
Error message :
Object reference not set to an instance of an object.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception details:
System.NullReferenceException: Object reference not set to an instance of an object.
As GetColorNonStroke is not a method of the original iTextSharp TextRenderInfo, I assume you are using a version patched according to #ChrisHaas' blog entry Getting color information from iTextSharp’s TextRenderInfo and ITextExtractionStrategy.
The members colorStroke and colorNonStroke of GraphicsState (the values of which are returned by the TextRenderInfo methods GetColorStroke and GetColorNonStroke respectively) are initialized with null.
Thus, as long as there was no explicit command before to set the stroking or non-stroking color, the respective TextRenderInfo method GetColorStroke or GetColorNonStroke will return null.
Therefore, whenever you use renderInfo.GetColorNonStroke(), check it for null before accessing its members. If the color is null, assume the default.
Furthermore, #ChrisHaas' color extension of iTextSharp's parsing capabilities uses different kinds of color objects: GrayColor, BaseColor, and CMYKColor. Depending on your requirements you might want to test the type of color returned to you.
In case of special colors, Chris Haas even writes:
SCN and scn themselves are catchalls for everything else that’s not RGB, CMYK or Grey. Before hitting one of those two you should actually first find a CS operator whose first and only operand is the actual color space to use. There’s a bunch of options for this including DeviceRGB, DeviceCMYK, Pattern, Lab, DeviceN, etc. You can find these in table 74 of the 2008 PDF spec section 8.6.8 (page 171). My code is actually not completely correct and I shouldn’t be pushing CS and cs to the SetStrokingGeneral method but instead should do some further processing. Unfortunately none of the samples PDFs that I had at the time had this set so I couldn’t test for it. Hopefully this helps you out!
Related
Basically I am working on a project which is very much similar to inventory management system. Uptill now I have made users form, category form and I was working on products form when this error came I have tried different methods but none of them seems to work.
In all the forms I am using stored procedures and some classes like insertion to add,updation to update,deletion to delete and retrieval to view the data.The exception comes to be "input string is not in correct format" Need help!
Insertion.insertProductItem=(pitemtext.Text, Convert.ToSingle(priceTxt.Text),Convert.ToInt32(catDD.SelectedValue.ToString()),status)
Most probably the error message is coming from Convert.ToSingle(priceTxt.Text) or from Convert.ToInt32(catDD.SelectedValue.ToString().
This error message simply means that the string you are trying to convert into int or single doesn't contain a valid string that can be converted to int or single. For example, following code will issue Input string is not in a correct format exception.
int foo = Convert.ToInt32(" "); //or even null
int foo = Convert.ToInt32("5p");
//and there can be many such cases. Browse the last link at the bottom.
I would suggest, check out the Int.TryParse method - it's much easier to use if the string might not actually contain an integer - it doesn't throw an exception and you can simply write an if statement to see if it works or not. Like,
int foo = 0;
if(int.TryParse(textBox1.Text,out foo))
You can even use Single.TryParse() method, read here
Now one last thing. This exception could also be thrown when you try to Convert.ToSingle() value that does not belong to the CurrentCulture. So you could instead use Convert.ToSingle(String, IFormatProvider). You can read more here.
A similar question has been asked earlier that can be helpful. Read here.
Using iTextSharp, how can I determine if a parsed chunk of text is both bolded and underlined?
Details:
I'm trying to parse .PDF files in C# specifically for text that is both bolded and underlined. Using ITextSharp, I can derive from LocationTextExtractionStrategy and get the text, the location, the font, etc. from the iTextSharp.text.pdf.parser.TextRenderInfo object passed to the overridden .RenderText method.
However, determining if the text is Bold and/Underlined from the TextRenderInfo object has not been straight forward.
I tried to use TextRenderInfo.GetFont() to find the font properties, but was unsuccessful
I can currently determine if the text is Bold or not, by accessing the private Graphics State field on the TextRenderInfo object and checking it's .Font.PostscriptFontName property for the word "Bold" (Ugly, but appears to work.)
Biggest issue: I haven't found anything to determine if the text is underlined. How can I determine this?
Here is my current attempt:
private FieldInfo _gsField = typeof(TextRenderInfo).GetField("gs",
BindingFlags.GetField | BindingFlags.NonPublic | BindingFlags.Instance);
//Automatically called for each chunk of text in the PDF
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
//UNDONE:Need to determine if text is underlined. How?
//NOTE: renderInfo.GetFont().FontWeight does not contain any actual information
var gs = (GraphicsState)_gsField.GetValue(renderInfo);
var textChunkInfo = new TextChunkInfo(renderInfo);
_allLocations.Add(textChunkInfo);
if (gs.Font.PostscriptFontName.Contains("Bold"))
//Add this to our found collection
FoundItems.Add(new TextChunkInfo(renderInfo));
if (!_lineHeights.Contains(textChunkInfo.LineHeight))
_lineHeights.Add(textChunkInfo.LineHeight);
}
Full source code of current attempt at: GitHub Repository (Two examples (example.pdf and example2.pdf) are included with text similar to what I'll be searching through.)
I tried to use TextRenderInfo.GetFont() to find the font properties, but was unsuccessful
I can currently determine if the text is Bold or not, by accessing the private Graphics State field on the TextRenderInfo object and checking it's .Font.PostscriptFontName property for the word "Bold" (Ugly, but appears to work.)
I don't quite understand this differentiation. TextRenderInfo.GetFont() is exactly the same as the Font property of the private Graphics State field of TextRenderInfo.
That being said, though, this is indeed one of the major ways to determine boldness.
Bold writing in PDFs is achieved either using
explicitly bold fonts (which is the better way); in this case one can try to determine whether or not the fonts are bold by
looking at the font name: it may contain a substring "bold" or something similar;
looking at some optional properties of the font, e.g. font weight, but beware, they are optional...
inspecting the embedded font file if applicable.
Neither of these methods is fool-proof;
the same font as for non-bold text but using special techniques to make them appear bold (aka poor man's bold), e.g.
not only filling the glyph contours but also drawing a thicker line along it for a bold impression,
drawing the glyph twice, the second time slightly displaced, also for a bold impression.
Underlined writing in PDFs is usually achieved by explicitly drawing a line or a very thin rectangle under the text. You can try and detect such lines by implementing IExtRenderListener, parsing the page in question with it to determine line locations, and then match with text positions during text extraction. Both can also be done in a single pass but beware, the underlines need not be drawn before the text or even shortly thereafter, the pdf producer may first draw all text and only then draw all underlines. Furthermore, I've also come across a funny construction, very short (e.g. 1pt) very wide (e.g. 50pt) vertical lines effectively are seen as horizontal ones...
IExtRenderListener extends the IRenderListener with three new methods, ModifyPath, RenderPath, and ClipPath. Whenever some path is drawn, be it a single line, a rectangle, or some very complex path, you'll first get a number of ModifyPath calls (at least one)
/**
* Called when the current path is being modified. E.g. new segment is being added,
* new subpath is being started etc.
*
* #param renderInfo Contains information about the path segment being added to the current path.
*/
void ModifyPath(PathConstructionRenderInfo renderInfo);
defining the lines and curves the path consists of, then at most one ClipPath call
/**
* Called when the current path should be set as a new clipping path.
*
* #param rule Either {#link PathPaintingRenderInfo#EVEN_ODD_RULE} or {#link PathPaintingRenderInfo#NONZERO_WINDING_RULE}
*/
void ClipPath(int rule);
(if and only if the path shall serve as clip path for the following drawing operations), and finally exactly one RenderPath call
/**
* Called when the current path should be rendered.
*
* #param renderInfo Contains information about the current path which should be rendered.
* #return The path which can be used as a new clipping path.
*/
Path RenderPath(PathPaintingRenderInfo renderInfo);
defining how that path shall be drawn (any combination of filling its interior and stroking the path itself).
I.e. for recognizing underlines, you'll have to collect the path pieces provided via ModifyPath and decide whether they might describe one or more underlines as soon as the RenderPath call comes.
Theoretically underlines could also be created differently, e.g. using a bitmap image, but I'm not aware of pdf producers doing so.
By the way, in your example PDF underlines appear consistently to be drawn using a MoveTo to the line starting point, a LineTo to its end, and then a Stroke to simply stroke the path. Thus, you'll get two ModifyPath calls (one with operation value MOVETO, one with LINETO) and one RenderPath call (with operation STROKE) respectively for each underline.
In DOCOTIC.pdf library there is a method responding as true or false.
In C#
bool FONT_ITALIC = data.Font.Italic;
bool FONT_UNDERLINE = data.Font.Underline;
Check for the value of FONT_ITALIC/FONT_UNDERLINE.
I have tried to use the same, but couldn't get correct value always.
Any suggestions are welcome.
I am working on a project porting VBA code to C# for Spreadsheet Gear. My team has successfully ported around 150 custom Excel Add-in functions. For one of the functions, our regression spreadsheet returns #value for 4 out of 148 function calls, with the remainder returning the expected result. When I step through the code, everything functions as expected and the correct result gets written to result.Number, but resolves to #value in Spreadsheet Gear. I'm totally baffled! Please help me - I don't even know where to start.
The end of the function is as follows:
if (retval < 0)
{
result.Text = UtilFuncs.ErrorNum(retval);
}
else
{
result.Number = retval;
}
retval contains the correct result returned from a call to a separate DLL. And when I hover over result.Number, it also contains the correct result. But I get a #value.
I can provide more code if necessary. My big question is why it works for all but 4 of them.
Without a more concrete example, it's about impossible to provide a definitive answer to this. I'll take a shot in the dark, though.
If you use IArguments methods like IArguments.GetNumber(...),
GetLogical(...), etc., to access your custom function's arguments, you might read through the documentation for these. For instance, see the following remark for GetLogical(...):
Non-logical arguments are converted to a logical value if possible.
Otherwise, false is returned by this method, and an internal flag is
set indicating that an error has occurred, causing the result of the
formula to be an error.
So if you pass in a non-boolean value into an argument where you are expecting a Boolean, your function may end up having this internal "error" flag being set and so result in a #VALUE! error being displayed for this cell, regardless of what you set the result object at the end of your method.
If this is the case, you can use the IArguments.ClearError(...) method to clear the error.
In a C# project, I ve got error using the AddImageFilter which is provided in the SimpleITK. Is there a common mistake that happens when trying to add two images with this filter? For example, maybe there is a rule the images should be both double or int.
The error I get is:
Image2 for AddImageFilter doesnt match type or dimension!
In a certain sense, your supposition is right. I couldn't been able to find the exact error you've got, but I found this sitkAddImageFilter implementation on GitHub. If you look at the AddImageFilter::Execute() function, at line 33, you'll find this exception been throwed:
std::cerr << "Both image for add filter don't match type or dimension!" << std::endl;
that seems related to yours (maybe the slight difference is just related to a different version of ITK). And that exception is throwed whenever this is verified:
if ( type != image2->GetDataType() || dimension != image2->GetDimension() )
So, a condition for the AddImageFilter is that both dimensions must be the same, and the metadata associated with the images must agree. This makes sense, because matrices addition is doable only when their dimensions match (and, of course, when they contain the same kind of informations).
If you are trying to add two different kinds of images (as an example: a DICOM with a TIFF), I suggest to convert at least one of them, in a way to have both of them in the same "metadata space".
The error message is a little vague and should be improved.
Many filters which take more than one image as input expect the following that pixel type, dimension, size, spacing and orientation to be same. The error message you got indicates that the pixel type or the size do not match. Likely emanating from the code generated from is line:
https://github.com/SimpleITK/SimpleITK/blob/master/TemplateComponents/ExecuteNoParameters.cxx.in#L8
I'd recommend printing your two images as strings to examine the meta-data to determine the difference.
This is taken from Jon Skeet's excellent personal C# site (http://www.yoda.arachsys.com/csharp/):
StringBuilder first = new StringBuilder();
StringBuilder second = first;
first.Append ("hello");
first = null;
Console.WriteLine (second);
1) Changing the value of first will not change the value of second -
2) although while their values are still references to the same object, any changes made to the object through the first variable will be visible through the second variable.
This is taken from the same sentence. What is meant by changing the value? I assume the value of a variable (eg int x = 4, or 5, or 45, etc).
Does this mean if first points to another compatible object, it won't have an effect on two?
Everything on that page makes sense, I think it's just an issue with my interpretation of the English.
Thanks
first is a reference to an object of type StringBuilder. That is, first stores a value that can be used to refer to an object on the heap that is type of Stringuilder. second is another reference to an object of type StringBuilder and its value is initially set refer to the same object that first is referring to.
If you change the value of first what you are doing is changing what the referent is. That is, you are using first to refer to a different object. This does not impact second; its value is unaffected by changes to the value of first. (Remember: the value of first and second are references that initially have the same referent. But just like with
int x = 1;
int y = x;
x = 2;
does not change the value of y, changing the value of first does not change the value of second.
On the other hand, when first and second refer to the same object, any changes to that object will be visible through both first and second.
Think of it like this. Let's say I create a text file first.html whose contents are
Stack Overflow
and I issue the command copy first.html second.html. Then both pages can be used to refer to the same webpage; by following the link we arrive at the same referent. If changes are made to the Stack Overflow home page, then accessing the homepage through either first.html or second.html will allow me to see those changes. But if I then change the contents of first.html to be
<a href="http://www.thedailywtf.com>The Daily WTF</a>
then I can no longer use first.html to refer to the Stack Overflow homepage. Moreover, this change does not impact the value of second.html. It is still
<a href="http://stackoverflow.com>Stack Overflow</a>
Think of the contents of these files as the values of a reference type, and the ultimate desination as the referent object.
The difference between the value of the object itself and the contents of the object are not clear.
For example, it is possible to change the contents of second by calling methods on first, as in the call to Append in the example. However, setting the value of first to null does not set second to null.
You can see this easily by writing this code and stepping through it in a debugger.