I am trying to figure out a way of getting Sikuli's image recognition to use within C#. I don't want to use Sikuli itself because its scripting language is a little slow, and because I really don't want to introduce a java bridge in the middle of my .NET C# app.
So, I have a bitmap which represents an area of my screen (I will call this region BUTTON1). The screen layout may have changed slightly, or the screen may have been moved on the desktop -- so I can't use a direct position. I have to first find where the current position of BUTTON1 is within the live screen. (I tried to post pictures of this, but I guess I can't because I am a new user... I hope the description makes it clear...)
I think that Sikuli is using OpenCV under the covers. Since it is open source, I guess I could reverse engineer it, and figure out how to do what they are doing in OpenCV, implementing it in Emgu.CV instead -- but my Java isn't very strong.
I looked for examples showing this, but all of the examples are either extremely simple (ie, how to recognize a stop sign) or very complex (ie how to do facial recognition)... and maybe I am just dense, but I can't seem to make the jump in logic of how to do this.
Also I worry that all of the various image manipulation routines are actually processor intensive, and I really want this as lightweight as possible (in reality I might have lots of buttons and fields I am trying to find on a screen...)
So, the way I am thinking about doing this instead is:
A) Convert the bitmaps to byte arrays and do brute force search. (I know how to do that part). And then
B) Use the byte array position that I found to calculate its screen position (I'm really not completely sure how I do this) instead of using the image processing stuff.
Is that completely crazy? Does anyone have a simple example of how one could use Aforge.Net or Emgu.CV to do this? (Or how to flesh out step B above...?)
Thanks!
Generally speaking, it sounds like you want basic object recognition. I don't have any experience with SIKULI, but there are a number of ways to do object recognition (Edge based template matching, etc.). That being said you might be able to go with just straight histogram matching.
http://www.codeproject.com/KB/GDI-plus/Image_Processing_Lab.aspx
That page should show you how to use AForge.net to get the histogram of an image. You would just do a brute force search using something like this:
Bitmap ImageSearchingWithin=new Bitmap("Location of image"); //or just load from a screenshot or whatever
for (int x = 0; x < ImageSearchingWithin.Width - WidthOfImageSearchingFor; ++x)
{
for (int y = 0; y < ImageSearchingWithin.Height - HeightOfImageSearchingFor; ++y)
{
Bitmap MySmallViewOfImage = ImageSearchingWithin.Clone(new Rectangle(x, y, WidthOfImageSearchingFor, HeightOfImageSearchingFor), System.Drawing.Imaging.PixelFormat.Format24bppRgb);
}
}
And then compare the newly created bitmap's histogram to the one that you calculated of the original image (whatever area is the closest in terms of matching is what you would select as being the region of BUTTON1). It's not the most elegant solution but it might work for your needs. Otherwise you get into more difficult techniques (of course I could be forgetting something at the moment that might be simpler).
Related
I'm trying to create a grid system for a 2D game. Not completely sure if I'll go with a hexagon-grid or isometric, but which shouldn't matter too much here. Basically, I'm trying to create a tower defense game where you deploy units instead of towers, and can move those as if you were playing a tactical game. The problem is that the tiles themselves are gonna be complex - there will be different types of terrain. Some units can only be deployed in X, and can't go through Y, that kind of thing. I also need the ability to add some logic to these tiles at will - who knows, maybe I want a special tile to give extra attack range to units above it, that kind of thing. The game must also "feel" like a grid - making things snap to the center of the tiles, as well as highlighting the tiles on hover and when moving/attacking.
Okay, this leads me to a pretty obvious route: I can create prefabs for the different types of tiles I need, add all the proprieties and logic as script components and create a grid class that instantiates each tile in the world. This works great, cause this way I have full control over everything - I can do whatever I want with the tiles, and I can also create a 2d matrix for their positions as I instantiate them. This way I can call tile[3, 6] for example, which sounds like a huge deal for pathing, highlighting and such. I can also link whatever gameobject is on top of it to the tile itself, so I could call something like tile[6, 2].ObjectOnTop.~WhateverInfoINeedFromIt, which also sounds super handy for overall logic.
But the shortcomings are also terrible - how do I even design and deploy different levels designs? The only way I can think of it is to figure out a way to do it all by hand, export that info somehow to a json file, and have the grid class that instantiates everything select which tile will be instantiated where based on the json info. I not only have no idea how to actually implement that, but I also think it would be an absurd amount of work for something that is supposed to be natural. I'm also not sure if a gameobject for each tile is a good idea in terms of performance. The biggest problem? It's easy to create such a grid if it's a simple squared tiles grid - but when we start talking about hexagons and isometric grids... it's not nearly as easy, honestly. The complex shapes make it so difficult to work with this kind of thing. For example, how do I even convert the mouse position into the equivalent tile? It's super easy for squares... not so much for the rest. It also kind of sucks that the grid is only really deployed when the game runs (is this generally considered a bad thing that I should avoid, btw?).
But don't worry, cause I've found the solution: tilemaps! Of course! It fixes all the problems I have, right? Supposedly yes, but it also removes all the power I had from having prefabs. Now I can't have any logic with tiles... They can't store any properties... so I'm doomed, I guess. I've read a bit on ways to overcome this (prefab brushes, custom classes inherenting from Tile, making a tilemap for each tile type), but they are honestly extremely niche and just don't feel right.
It's so weird, a generic grid system like this was supposed to be so simple and common. But I can barely find any information at all. It's like I'm missing this pretty obvious tool that no one seems to mention cause it's that obvious. So here I am, struggling to start a project cause I can't even figure out how to implement the basic structure of the game. Everything I see online leads me to tilemaps - but they only work for very basic stuff, from what I understood. I won't work for this kind of game, I think. I have no idea what to do at this point - there must be an optimal way to solve everything, the one that is likely used for all the devs that work on this kind of game. And honestly, there are a ton of them.
So, please, shed me some light. And thanks a lot in advance!
(Here's someone that posted an extremely similar question: Per Tilemap Tile Data Storage & Retrieval)
I have the exact same problem. The best i found is this: https://m.youtube.com/playlist?list=PLzDRvYVwl53uhO8yhqxcyjDImRjO9W722
It is playlist by CodeMonkey where he creates grid by script but also adding generics, which allow you to have that logic that you want and adding visual tiles. The only problem is that while I want squares while you want different shapes. I don't know how to do that, maybe you can adjust the grid(more likely tilemap it uses to display visuals) or maybe you should just let go of different shapes, i don't know.
Hope i helped at least little. I seen this post and it is very sad that noone replied to you. Hope you can achieve what you want and that it will be succes. Good luck and mainly have fun : )
convert mouse position to tile position can be done different ways, simplest would be to raycast the mouse and see if it hits a tile
probably easiest to make a 2dimensional array of tiles of class 'Tile', even if its hex break it down into a 2d array and offset , setup the tiles dynamically in code
for pathing and such djikstra algorithm is pretty useful
https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
May be it can help
grid = new gridSpace[roomWidth + 3, (roomWidth / 2) + 2];
int xStep = 4; # 2:1 isometry type
for (int x = 0; x < roomWidth - 1; x += xStep) #xStep depends by type of isometry
{
int minX = x;
int maxX = x + 2;
int localX = x;
for (int y = roomWidth / 2; y > -1; y--)
{
grid[localX, y] = gridSpace.empty;
if (localX == minX)
{
localX = maxX;
}
else
{
localX = minX;
}
}
}
Hi im working on a program to load DICOM-files (medical fileformat for images) and display them in a mac os cocoa app. Im coding in C# using visual studio.
Ive gotten to the point where i can make a 2d array of ints representing the intensity, there is no color.
I do not know how to make this into a NSImage so that i can use it as an image in cocoa.
Suggestions?
There are a variety of ways to accomplish this, depending on your needs, but the "modern" method is to use the NSImage class factory method:
+ (instancetype)imageWithSize:(NSSize)size flipped:(BOOL)drawingHandlerShouldBeCalledWithFlippedContext drawingHandler:(BOOL (^)(NSRect dstRect))drawingHandler;
This method takes a block parameter with the code that turns your data into an image, just as if you were drawing it onto the screen for a custom NSView implementation. (If you've never done this before, start with Cocoa Drawing Guide: "Views and Drawing" as well as "Graphics Contexts".
The factory method is smart and (unless the resolution of your screen changes), it will only be called once and the resulting image cached in the NSImage's representation.
The next alternative (which might actually be easier for you) is to translate your int array into an RGBA (or whatever) buffer representing the pixel values in the image. This process begins by creating a specific NSBitmapImageRep (this is a "representation" object—each NSImage is a collection of one or more representations).
This will require some research into the various pixel buffer formats, but it's also pretty easy to do. Once you've built your 2D pixel buffer representation, you can turn that directly into an NSImage.
These, and other techniques, are explained in the Cocoa Drawing Guide: "Creating NSImage Objects".
I'm using Oxyplot HeatMapSeries for representing some graphical data.
For a new application I need to represent the data with isosurfaces, something looking like this:
Some ideas around this:
I know the ContourSeries can do the isolines, but I can't find any option that allows me to fill the gaps between lines. Does this option exists?
I know the HeatMapSeries can be shown under the contourSeries so I can get a similar result but it does not fit our needs. .
Another option wolud be limiting the HeatMapSeries colours and eliminate the interpolation. Is this possible?
If anyone has another approach to the solution I will hear it!
Thanks in advance!
I'm evaluating whether Oxyplot will meet my needs and this question interests me... from looking at the ContourSeries source code, it appears to be only for finding and rendering the contour lines, but not filling the area between the lines. Looking at AreaSeries, I don't think you could just feed it contours because it is expecting two sets of points which when the ends are connected create a simple closed polygon. The best guess I have is "rasterizing" your data so that you round each data point to the nearest contour level, then plot the heatmap of that rasterized data under the contour. The ContourSeries appears to calculate a level step that does 20 levels across the data by default.
My shortcut for doing the rasterizing based on a step value is to divide the data by the level step you want, then truncate the number with Math.Floor.
Looking at HeatMapSeries, it looks like you can possibly try to turn interpolation off, use a HeatMapRenderMethod.Rectangles render method, or supply a LinearColorAxis with fewer steps and let the rendering do the rasterization perhaps? The Palettes available for a LinearColorAxis can be seen in the OxyPalettes source: BlueWhiteRed31, Hot64, Hue64, BlackWhiteRed, BlueWhiteRed, Cool, Gray, Hot, Hue, HueDistinct, Jet, and Rainbow.
I'm not currently in a position to run OxyPlot to test things, but I figured I would share what I could glean from the source code and limited documentation.
for my final exam (to graduate from university) I’ve been asked to create a tiny compiler, with the following requirements:
The student must develop a basic compiler with all the design parts
that conforms it (lexical analysis, syntaxis analysis, parsing, etc).
This compiler will have an interface that shows 2 panels, a graphic
representation (A) and code representation (B). The user will be
able to draw a geometric shape in panel (A), and the program will show
in panel B the code generated for that shape, and the if the user
types code in (B) it will show the shape in (A).
This compiler must handle at least 7 primitives (I guess this means
commands). The geometric shape must be created from the primitives.
The student will have to include a primitive to rotate the shape.
So the thing is that we never studied compilers in depth, just the very basic theory, I only have 20 days to finish this!! and I’m pretty sure they want to make me flunk because I asked the professor to tell me what is a primitive and he said he wouldn’t answer that because that is part of the vocabulary of the course I want to pass.
So the question here is:
How should I start, how do I create this thing in .NET or how do I create my very small set of instructions to create geometric shapes?
Is there something out there similar to this requirement to take it as an example and modify it?
P.S.: I’m a .net C# developer (good skills). I know the basics of C and Java. I’m reading about parser generators (ANTLR, lex & YACC, Ray) but there’s no basic tutorial for that, there are a lot of new terms like BNF grammar (what is this, a language, a txt file?). It is so hard because there’s no easy way to start, or samples for C#.
I don’t want to do this project in C or C++ because since it’s using graphics and my C knowledge is basic, I’m afraid I won’t be able to do it, I would like to use .Net
This isn't so much a compiler as an interpreter/designer. But I digress.
Basically what you are being asked to create is a "drawing command language", and a program that can interpret that command language. For an example of what a "drawing command language" is usually expected to do, take a look at LOGO.
What you are required to do is to define a simple set of instructions (primitives) that will, in the proper combination, cause a shape to be drawn. You will also have to include a primitive to rotate the shape. Here's the Wikipedia definition of "primitive" in the appropriate context. By doing this, you are creating a "language" and a "runtime"; theoretically you could save the commands in a file, then re-load them into the program and re-run them to generate the same shape.
There are three major ways you could go with this:
Define primitives to draw different types of lines (straight, curved, solid, dashed, etc etc) and set the color with which to draw the next line(s). This would likely have you creating primitives just to create primitives; your main primitives will be "Set Color" and "Draw Line".
Define primitives to draw various pre-defined shapes (line, circle, rectangle, pentagon, hexagon, etc etc). This is probably what your classmates will do, and it's going to both take a while and not be very functional.
Implement "turtle drawing" much like LOGO. There would be a cursor (the "turtle") that is represented on-screen somehow, and its current location and where it goes is integral to the drawing of lines.
Personally I like the last idea; you'll need primitives to move the turtle, to mark the start and end positions of lines, set colors, rotate, clear, etc:
MVUP x - Move turtle up by x pixels
MVDN x - Move turtle down by x pixels
MVLT x - Move turtle left by x pixels
MVRT x - Move turtle right by x pixels
SETC r g b - Set the line-drawing color to an RGB value
STLN - Mark the start of a line at the turtle's position
ENDL - Mark the end of a line at the turtle's position; causes the line to be drawn from start to end using the currently-set color.
RTCL x - Rotate canvas x degrees clockwise (this requires some matrix math, and you will lose anything you've drawn that falls outside the canvas after rotation)
RTCC x - Rotate canvas x degrees counter-clockwise (ditto)
CNTR - Place turtle in the very center of the canvas. Useful when defining an initial position from which to begin, or to avoid reversing a number of complex movements to get back to the center and draw again.
CLRS - Remove all drawn lines from the pad. This, along with CNTR, should probably be the first two commands in a "program" to draw any particular shape, but if you omit them, the program can build on itself iteratively by being run on top of its previous output to create fractal patterns.
I just gave you 11 primitive commands that could be used to move a cursor from place to place on a canvas, drawing lines as you go, and could draw any 2D shape the user wished. You could also use a forward-backward-turn left-turn right model, as if the turtle were a robot, but that would probably make this more complex than it has to be (remember YAGNI; it will serve you well in industry).
Once you have the language, you have to make it work 2 ways; first, the program has to know how to interpret the instructions entered in a textbox in order to draw/redraw the shape on a drawing pad, and second, the program has to accept mouse input on the drawing pad, interpret those commands as moving the turtle/marking start or end/setting colors, and enter the commands into the textbox. That's your project, and I leave its implementation to you.
I recommend you implement Context-Free Art.
It will be relatively ugly to convert drawing to code - you'll inevitably describe a set of fixed shapes at fixed locations. And all your classmates will do likewise.
But by implementing the context-free grammar, you'll be able to generate absolutely stunning pictures from code. This will be massively satisfy and will spur you to polish and pass and you'll be left with something to show off afterwards.
It sounds like the exercise is about generating and parsing something like SVG. It also sounds like the professor is not completely clear on what a compiler is. Translating from text to graphics might be within some academic definition of compiler, but it certainly doesn't reflect any real world meaning (in spite of work done on graphical programming languages).
Perhaps to fulfill the requirement of "compiler" you could translate a subset of SVG (which is textual) to JavaScript commands for generating the graphics. For the meaning of "primitive" you could just google it. Sort of, show some initiative.
Then you could do your user interface in HTML, which would fit the time frame better than anything else I can think of. Essentially this would be something like Google Documents Draw, except with display of the drawing's representation as SVG.
I'm trying to design an auto-focus system for a low cost USB microscope. I have been developing the hardware side with a precision PAP motor that is able to adjust the focus knob in the microscope, and now I'm in the hard part.
I have been thinking about how to implement the software. The hardware have two USB ports, one for the microscope camera and another for the motor. My initial idea is to write an application in C# that is able to get the image from the microscope and to move the motor forth and backwards, so far so good :)
Now I need a bit of help with the auto-focus, how to implement it? There is any good algorithm for this? Or maybe an image processing library that will help my in my task?
I have been googleling but with no success... I'll appreciate any help/idea/recommendation!
Many thanks :)
EDIT: Thanks guys for your answers, i'll try all the options and get back here with the results (or maybe more questions).
The most important piece is code which tells you how much out of focus the image is. Since an unfocused image loses high frequency data I'd try something like the following:
long CalculateFocusQuality(byte[,] pixels)
{
long sum = 0;
for(int y = 0; y<height-1; y++)
for(int x=0; x<width-1; x++)
{
sum += Square(pixels[x+1, y] - pixels[x, y]);
sum += Square(pixels[x, y] - pixels[x, y+1]);
}
return sum;
}
int Square(int x)
{
return x*x;
}
This algorithm doesn't work well if the image is noisy. In that case you could to downsample it, or use a more complex algorithm.
Or another idea is calculating the variation of the pixel values:
long CalculateFocusQuality(byte[,] pixels)
{
long sum = 0;
long sumOfSquares = 0;
for(int y=0; y<height; y++)
for(int x=0; x<width; x++)
{
byte pixel=pixels[x,y];
sum+=pixel;
sumofSquares+=pixel*pixel;
}
return sumOfSquares*width*height - sum*sum;
}
These functions work on monochromatic images, for RGB images just sum the values for the channels.
Using this function change the focus trying to maximize CalculateFocusQuality. Increase the stepsize if several attempts in a row improved the quality, and decrease it and reverse the direction if the step reduced the quality.
Autofocusing a microcoscope is a long standing topic in optical research.
You can learn a bit about the involved algorithms here.
The problems involved are not only how to meassure defocus, but also how to move the optical axis in an optimal way, and how to correct algorithmically the residual aberrations.
HTH!
just some of my experiences trying to solve similar task. On my system a 200x magnificatin is used. Stepper resolutin in Z-direction 0.001um.
The problems I've faced:
-Shaking. The image on theoretically better position could be evaluated worse because of suddenly shaking. As the API of my system didn't allow to move z-axix and make images in parallel, I had to move in steps and capture sequenttially. Each move-stop caused shaking. Interestingly, the shaking were more severe while moving down than moving up.
-Mechanical imprecision. Making a scan and moving to theoretically best position may bear an error, because stepper-position in controller may be not the same as the mechanical position.
-Exposure: Depending on the application, the brightness of the image may vary, so that exposure should be adjusted. Depending on focus-evaluation algorithm (whether brightness is involved in the calculation or not) the exposure may be required to be fixed. That results in the chicken-egg problem - how to setup exposure, if image brightness is unknown and how to focus, if required exposure is unknown.
Finally, to avoid mechanical problems I've (re)stored best image found while focusing and returned it at the end.
Concerning the algorithm for focus-value, the best was looking for edges combined with entire number of colors (histogram width). But of cause, it depends on the type of image you process.
Regards,
Valentin Heinitz
There's some information on Wikipedia
Technically it can be implemented as
high pass filter and some system which
conscientiously moves lens around the
point where filter output is highest.
Digital processing is not required
Also, 5 out of the first 6 matches I get from Googling for "autofocus algorithm" seem to have relevant and useful information (although in one or two cases the full details of papers requires payment)