Powerpoint "Save As Picture" from C# Microsoft.Office.Interop.PowerPoint - c#

My question is pretty similar to this one and I'm afraid the answer is the same... I want to save all the shapes/images on a slide as a single png (or jpeg). Programmatically, I get as far as
slide.Shapes.SelectAll();
but don't see a way to save as image. Is this possible? If not, any other suggestions, hopfully w/ examples? (not VBA - I need to automate the whole conversion)
There was a reference to OpenXML in the other post, but I'm not even sure how to pull that in.

I don't know how you'd do this in C# but I'd guess that you'd make use of the same methods as you would with VBA, where you can do:
Activewindow.Selection.ShapeRange.Export( "c:\temp\delete-me.jpg",ppShapeFormatJPG)
ppShapeFormatJPG is a PowerPoint constant, a VBA Long = 1; IIRC that'd be an Integer in C#.
The method also can take two more optional parameters, scalewidth and scaleheight, which govern the width and height of the exported image in undocumented ways. By default, no parms supplied, I get exports at 72 dpi. Larger numbers result in higher pixel count exports but distorted proportions. I'm sure there's some strange logic to it, but it escapes me; all hints welcome!
There's a third optional parm, ExportMode. In my tests, it makes no difference whether you supply it or not, and if you do, which of the available values you choose.

Related

System.Drawing.Drawing2D.GraphicsPath.PathData documentation/examples

I want to write a converter from GraphicsPath to a different format.
MSDN seems very terse on this topic and doesn't help to create a full picture - I can't be sure I can handle all point types correctly.
PathPointType Enum,
GraphicsPath.PathTypes Property
- both repeat approximately the same description, except it doesn't give the full picture to someone who are not familiar already with what it supposed to mean.
I can dump some time into reversing it from hand-crafted examples, but I can't estimate and justify the time for that and I'm not sure I can cover all cases.
Is there an article/example that is targeted for custom rendering of GraphicsPath rather than for consumer code and also goes into greater detail?

Excel-DNA: grouping rows via C API feature of Excel-DNA

I'm familiar with how to group a range in Excel VSTO/COM interop:
ws.EnableOutlining = true;
ws.Outline.SummaryRow = XlSummaryRow.xlSummaryAbove;
var rng = GetRangeSomeHow();
rng.EntireRow.Group();
rng.EntireRow.OutlineLevel = someLevel;
What is the most efficient way to do this in Excel-DNA? I would imagine there must be a C-API way to do it, encapsulated cleverly in Excel-DNA somehow, but for the life of me, I can't figure it out via online documentation (incl. Google).
There's a lot of posts using code similar to my sample above, but these are pretty expensive calls, especially since I need to do this ~5000 times overall (I have a really big data set).
EDIT:
So there seems to be this method call:
XlCall.Excel(XlCall.xlfGroup...)
The only problem is, I have no idea what the parameters are. It seems an ExcelReference should be passed in, but how is the .EntireRow resolved? Will the C API just handle it for me - in which case I just need to pass a new ExcelReference(1,100,1,1) and be done with it... or is there more to this?
Thanks in advance to anyone who can answer my question!
I don't think the C API GROUP function is te one you're looking for. The documentation says:
GROUP
Creates a single object from several selected objects and returns the
object identifier of the group (for example, "Group 5"). Use GROUP to
combine a number of objects so that you can move or resize them
together.
If no object is selected, only one object is selected, or a group is
already selected, GROUP returns the #VALUE! error value and interrupts
the macro.
I'd suggest you use the COM object model for this kind of thing, even in an Excel-DNA add-in. The C API has not really been updated over the years for the general sheet manipulation like this case, so you're likely to run into some features that don't work right or are incomplete relative to the COM object model.
From your Excel-DNA add-in, just make sure your get hold of the right Application root object with a call to ExcelDnaUtil.Application.
For improved performance of this kind of sheet editing, you pretty much have to use the same tricks as from VBA or VSTO - disable screen updating and calculations etc.

Approval Tests Image comparison with mask

Is it possible to compare two images with a mask for area's that do not need to be compared.
I managed to get it working with a basic file comparison
[UseReporter(typeof(BeyondCompareReporter))]
public void ThenThePageShouldMatchTheApprovedVersion()
{
SaveScreenshot("page1");
Approvals.VerifyFile(#"C:\page1.png");
}
But i would like to create a mask of the area's i expect to change. Is this possible with ApprovalTests or will i need to modify the screenshot and manually apply the mask before comparing with the approved file. Or is it possible to write your own validators?
It's not possible to mask the area so the comparer will not compare them.
However, it is very easy to actually mask the area (ie, place a black square over the area before you call Verify)
Alternatively, you can usually mock out the variable that is changing.
Details on Comparer:
ApprovalsFileComparer is a very stupid comparer. It knows nothing about file formats and has no idea of what an image is. It simply compares byte to byte. This simplicity allows it to work everywhere, but removes the ability to be smart about stuff. This is usually not an issue as the reporters are very very smart. Able to render and compare and do subtractive diffs and the like.
Happy Testing!

How to detect keyword stuffing?

We are working on a kind of document search engine - primary focused around indexing user-submitted MS word documents.
We have noticed, that there is keyword-stuffing abuse.
We have determined two main kinds of abuse:
Repeating the same term, again and again
Many, irrelevant terms added to the document en-masse
These two forms of abuse are enabled, by either adding text with the same font colour as the background colour of the document, or by setting the font size to be something like 1px.
Whilst determining if the background colour is the same as the text colour, it is tricky, given the intricacies of MS word layouts - the same goes for font size - as any cut-off seems potentially arbitrary - we may accidentally remove valid text if we set a cut-off too large.
My question is - are there any standardized pre-processing or statistical analysis techniques that could be use to reduce the impact of this kind of keyword stuffing?
Any guidance would be appreciated!
There's a surprisingly simple solution to your problem using the notion of compressibility.
If you convert your Word documents to text (you can easily do that on the fly), you can then compress them (for example, use zlib library which is free) and look at the compression ratios. Normal text documents usually have a compression ratio of around 2, so any important deviation would mean that they have been "stuffed". The analyzing process is extremely easy, I have analyzed around 100k texts and it just takes around 1 minute using Python.
Another option is to look at the statistical properties of the documents/words. In order to do that, you need to have a sample of "clean" documents and calculate the mean frequency of the distinct words as well as their standard deviations.
After you had done that, you can take a new document and compare it against the mean and the deviation. Stuffed documents will be characterized as those with a few words with very high deviation from the mean from that word (documents where one or two words are repeated several times) or many words with high deviations (documents with blocks of text repeated)
Here are some useful links about compressibility:
http://www.ra.ethz.ch/cdstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/3052.pdf
http://www.ispras.ru/ru/proceedings/docs/2011/21/isp_21_2011_277.pdf
You could also probably use the concept of entropy, for example Shannon Entropy Calculation http://code.activestate.com/recipes/577476-shannon-entropy-calculation/
Another possible solution would be to use Part-of-speech (POS) tagging. I reckon that the average percentage of nouns is similar across "normal" documents (37% percent according to http://www.ingentaconnect.com/content/jbp/ijcl/2007/00000012/00000001/art00004?crawler=true) . If the percentage were higher or lower for some POS tags, then you could possibly detect "stuffed" documents.
As Chris Sinclair commented in your question, unless you have google level algorithms (and even they get it wrong and thereby have an appeal process) it's best to flag likely keyword stuffed documents for further human review...
If a page has 100 words, and you search through the page detecting the count for the occurences of keywords (rendering stuffing by 1px or bgcolor irrelevant), thereby gaining a keyword density count, there really is no hard and fast method for a certain percentage 'allways' being keyword stuffing, generally 3-7% is normal. Perhaps if you detect 10% + then you flag it as 'potentially stuffed' and set aside for human review.
Furthermore consider these scenarios (taken from here):
Lists of phone numbers without substantial added value
Blocks of text listing cities and states a webpage is trying to rank for
and what the context of a keyword is.
Pretty damn difficult to do correctly.
Detect tag-abuse with forecolor/backcolor detection like you already do.
For size detection calculate the average text size and remove the outliers.
Also set predefined limits on the textsize (like you already do).
Next up is the structure of the tag "blobs".
For your first point you can just count the words and if one occurs too often (maybe 5x more often than the 2nd word) you can flag it as a repeated tag.
When adding tags en-mass the user often adds them all in one place, so you can see if known "fraud tags" appear next to each other (maybe with one or two words in between).
If you could identify at least some common "fraud tags" and want to get a bit more advanced then you could do the following:
Split the document into parts with the same textsize / font and analyze each part separately. For better results group parts that use nearly the same font/size, not only those that have EXACTLY the same font/size.
Count the occurrence of each known tag and when some limit set by you is exceeded this part of the document is removed or the document is flagged as "bad" (as in "uses excessiv tags")
No matter how advanced your detection is, as soon as people know its there and more or less know how it works they will find ways to circumvent it.
When that happens you should just flag the offending documents and see trough them yourself. Then if you notice that your detection algorithm got a false-positive you improve it.
If you notice a pattern in that the common stuffers are always using a font size below a certain size and that size i.e 1-5 which is not really readable then you could assume that that is the "stuffed part".
You can then go on to check if the font colour is also the same as the background colour and remove it that section.

handling pixels in xsl fo

I have a Web front end and I am trying to handle layout with tables, because my tables all contain a col with a width- in pixels, what is the best way to handle it inside the pdf to get a consistent layout...?
I am using fo.net and the code I use to convert pixels to in is: However on different machines I am getting inconsistent results...
<xsl:value-of disable-output-escaping="yes" select="floor(#width div 72)"/>
<xsl:text>in</xsl:text>
Is there a way using c# to get the screen resolution and any other info to get a more accurate result?
To answer my own question... use % with a fixed max width on the table, that way I can get the Xsl to work out the % of each column based on the total width of a table. This workaround seems like the best and most felxible way to handle my situation, the biggest problem is, pixels cannot be translated into xsl fo - if a person is working on a page then moves onto a different machine the outputted pdf could be drastically different.
On the note: I would like for mm to be supported alongside pixels in WYSIWYG editors... as I am using jquery client side I will most likely enhance my tables therefore requiring me to create this plugin. I hope this info helps anyone else who wants to create pdfs from client side WYSIWYG editors, I am sure this info can be applicable for other scenarios too... :)
In the class 'System.Windows.Forms.Screen' there are several functions and values concerning the screen.

Categories