So here is the thing .. i have my final year project coming and i have this idea of video search engine ...
It will do these following things ... get the user query or whatever he/she wants to search and then search the video frame by frame ... and i know it might take a lot of time ...
There actually will be two steps the pre-processing stage where the algorithm will run that will put tag on videos like youtube does.. only this time the tagging will be done by the algorithm which i don't know of..
I just need an initial push to start ...
Is there any algorithm which will give the result i want ..?
PS : This will only work for video Lectures ..if there are any other ideas please do tell.. !
You need to break the problem in to it's component parts first as there will be no one solution or algrorithom to do what you want (otherwise your Sr. Project would be done for you already).
From what I can tell here are the parts that I can see.
Get a video stream
Split the video stream in to relevant chunks to process in detail. (look for more than say 30% change in a short time span (like a blackboard being erased))
Process the chunk in detail either passing it to the next step or splitting the chunk in to two smaller chunks. (maybe look for a smaller change over a longer time span)
OCR the text.
Detect if the previous chunk has the same text, if so, throw the current chunk out (you did too fine of splitting in step 3 or 4).
Store the OCR data in a database of some sort with the time index of the text.
Build a program to query that database for student use.
Each of those steps will have sub steps to them that you can use the same technique of divide and conquer to figure out how to do that step.
If you need any help doing one of those singular steps let us know in a new question (one topic per question please).
Related
This is a stand alone application, and the data doesn't need to be saved for a later time, and data will not be shared between users, it's for one user to input their data and carry out an assessment, the user can then note down the results, but none of this needs to be stored into the app for a later time.
It's a 2 stage assessment, the first stage requires the user to fill out a number of forms with structural details of columns, depending on how many columns they have, and the second stage needs to sum some of those column values and some will need to be averaged, to then display the final values of the assessment, onto a final form.
Only numbers are entered into each form, and results will also be in the form of numbers which are then displayed on a graph. There are around 30 text boxes of user input, which are input into a input form that pops of from a button from the parent form, stage one of the assessment is then carried out for each column on each parent form. Each parent form renderred onto a new tab using this EasyTabs solution Creating a C# Application with Chrome-Style Tabs using EasyTabs in WinForms - YouTube[^] that I found online.
I'm an absolute beginner on C# , so I couldn't figure out how I would take a value from each form and display the sum or average onto the final form, if I don't know how many forms there will be for each user during run time, as each user will have a different amount, I was thinking it's maybe some sort of loop during run time, but I'm just not sure what that would look like.
After speaking to a friend, they recommended a single form, with a save and refresh button, where the data gets saved onto a file, and then gets retrieved would be better, but there are say 10 different user input values that need to be picked up from each form, and then averaged or summed, I started learning about StreamReader and StreamWriter and how files work in C#, but it was really difficult to figure out how to lay out the data in the file, how to get C# to sum the correct values together etc etc.
What would be the best way to approach this problem?
Thank you for your help.
If by "Form" you mean the WinForms definition of Form (Basically a new window), then I advise you to have a look at this post describing how to get data from one Form to the other.
To make a TextBox only accept numbers, you can use the a regular expression validation ^[0-9]*$ if you want to allow empty inputs or ^[0-9]+$ if you want at least one digit. if you want to allow negative numbers as well it is ^(\+|-)?[0-9]$. Note: all those will allow leading zeroes. For decimals you need to allow periods as well. ^(\+|-)?[0-9]*(\.[0-9]+)?$. Read up on regular expressions if you need anythin else. This website can help you design and test them.
To get a number from your text box you need to parse the text in the textbox. dotNet has plenty of functions for that already. For example Int32.TryParse() or if you want it to throw an exception Int32.Parse(). Others are similar. Have a look at the MSDN documentation to see how to use them.
As for averaging: Making a database, connecting to it and then using it to calculate the average of a number does seem like a pretty roundabout way of doing things. In programming you tell the computer what to do. So if someone told you to calculate an average, would you write the numbers in a book an then send them off to someone to calculate for you? Surely not.
You can just add all the numbers together in your code and divide by the amount of numbers. But there is an even easier solution. Add all your numbers to a List<int> and call it's .Sum() method and then divide by the return of it's .Count() method.
Saving the data to a file is only needed if you want to shut off the program between the user inputs or want to document the inputs.
Writing and reading numbers from a file is fairly straight forward though. Have a look at these tutorials provided by Microsoft: Write to file, Read from file, Read file one line at a Time. Writing to files uses the exact same methods as System.Console, just that you use a file stream instead of StdIO.
As for data layouts, there are several sane variants. For something simple you can just make your own (i.e. the position in the file dictates where the data goes, or each data point comes with a descriptor before it, for example inputA: 24). For more complex data, I would consider creating a class for your data and then serialize the data, for example using the popular Newtonsoft JSON library or XML and don't worry about how the file is structured.
I have thousands of jpegs in a folder structure. These images are a snapshot of my driveway in 2560 x 1440 and are taken and stored every 60 seconds.
I'd like to create a program that can detect, from analyzing an image, whether I or my wife, was home at that particular time or not. I have a red car, she has a bright yellow car. So a simple color threshold should probably suffice. Another clear distinction is that we both have our own spot and never park in the others. Also, other people don't use the driveway (and if they do, I don't mind a false positive). One minor complication is that the camera's switch to black/white during the dark (but that may be when the parking spot rather than the color might come in handy).
So I was hoping I could use ML.Net and train a model with some hand-annotated images where I tag the image with data whether I see my or her car in the driveway. I was thinking of annotating maybe a 100 to a couple of hundred images for day and another set for night and feed all these images to ML.Net to train it and then have analyse a few 100 images where I can manually check the results and correct any mistakes and then create a sort of feedback-loop to train on a few hundred more images.
Once the training is complete I'd like to analyze all images currently stored and each new image as it comes in to generate some data on when I'm (or my wife is) home, away etc.
My problem is (and this is probably going to be the reason for the question being closed as "too broad" or something): I have no clue on how to do this. I have seen awesome tutorials that all make it seem like child's play but when I then try to do this in C# (my language of choice) and look for ML.Net Howto's I can't seem to find anything that helps me in the right direction.
For example: Train a machine learning model with data that's not in a text file. I'm a competent programmer so it's peanuts to create CSV file / database / whatever that has 1.jpg -> rob home, wife not home data. But the "How To" doesn't explain how to feed the image into ML.Net and I haven't been able to find anything that does. Most probable cause is that I'm new to ML(.Net) and probably that I'm too stubborn to give up trying to accomplish this in C# but the information available is, weird as it sounds, overwhelming but also scarce. The information available usually leads me going down some rabbit hole only to find out after way too long that it's not what I want or I can't find anything that hints of me going in the right direction.
So long story short; tl;dr:
How do I feed images into ML.Net, how do I tell ML.Net that my/her car is in the driveway for any given image (training) and how do I get ML.Net to tell me whether it thinks I'm / my wife is home or not for a given image? Or is this not possible (currently)? I'm NOT looking for complete code but for pointers, hints, links, tutorials, examples or whatever may help me in the right direction.
you might find something usefull here Image recognition/classification using Microsoft ML.net 0.2 (Machine learning)
However I would encourage you to consider python as weapon of choice for your task.
Here you would just store the data in different folders according to the label, you #home, your wife #home, both #home, no car in the drive way, other
and you are ready to go.
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
It probably won't take you more than a weekend, and thats inlcuding to learn the bacics of python.
Edit:
I seems as it still does not support to train image classification tasks using ML.Net: "Again, note that this sample only uses/consumes a pre-trained TensorFlow model with ML.NET API. Therefore, it does not train any ML.NET model. Currently, TensorFlow is only supported in ML.NET for scoring/predicting with existing TensorFlow trained models."
There is a thread about it here https://github.com/dotnet/docs/issues/5379,
What you could try is uses: http://www.emgu.com/wiki/index.php/Main_Page in combination with OpenCV, this https://www.geeksforgeeks.org/opencv-python-program-vehicle-detection-video-frame/ is an example in python but it should translate well to c++ or c# using emgu. Once the car is detected check for the position and color. This approach would probably also avoid labeling any data.
Alternatively use a pre trained model h5 file and load into ML.Net then check for the position and mean color to check whos car it is.
I would like to know if there is a way to save MultiSourceFrames to disk in such a way that I can load them up to use later.
The reason for this is because I have far too much processing to do on each frame to reasonably perform this live. I have no need to process the frames in real time so, I would like to find a way to save a number of frames to disk (or even to memory?) and perform my processing afterwards.
So far, I have tried storing these in a List<MultiSourceFrame> but, for each frame, I find that I can't then acquire the ColourFrame component (for example), presumably because the whole object structure is not saved.
Potential Solution Idea?
I know that Kinect Studio is able to save .xed files but I really need to be able to do this from code. Moreover, I don't know whether I can turn the .xed file back into a collection of MultiSourceFrames.
I'd be really grateful if anyone can help me out with this problem! I promise to upvote/accept helpful answers!
You can't just save the MultiSourceFrame object. Instead, you should extract the (raw) data you need from the frames and save that.
I've been working on a project in the last few days and there is a task in this project that I actually don't know how to do, the project includes analyzing web pages to find tags that Characterizes the page.
hey buddy , what you mean by tags? by saying tags I mean keywords that summarize what the web page about. For example here on SO you write you're own tags so people can find you're question better. What I am talking about is building an algorithm to analyze the web pages to find it's tags by the text within the page.
I started with getting the text from the page -> accomplished
generally im looking for a way to find the keywords that Concludes what the webpage about
However, I don't really know what to do next. Does anyone have a suggestion?
For a really basic approach, you could use the TF-IDF algorithm to find the most important word in your page
Quick overlook from wikipedia:
The tf–idf weight (term frequency–inverse document frequency) is a
weight often used in information retrieval and text mining. This
weight is a statistical measure used to evaluate how important a word
is to a document in a collection or corpus. The importance increases
proportionally to the number of times a word appears in the document
but is offset by the frequency of the word in the corpus. Variations
of the tf–idf weighting scheme are often used by search engines as a
central tool in scoring and ranking a document's relevance given a
user query. tf–idf can be successfully used for stop-words filtering
in various subject fields including text summarization and
classification
Once you find the most important word in your page you can use them as tags.
If you want to improve your tags and make them more relevant.
There are a lot of way to proceed, but you can proceed as below:
Extract a bunch of text from which you know the main tags.
For all this text run a TF-IDF algorithm and create a vector with the
ones with the highest score.
Try to find a main direction will all these vectors. (running an ACP
for example, or any machine learning tool)
And use this tag to represent the set of words from the main direction. (the largest vector of the ACP)
Hope it's understandable and it helps
Typically you look for certain words surrounded by certain html. For example, titles are typically in an H tag such as <h1>.
If you parse a page for all of it's H1 tags then it stands to reason that the content following that tag is related. An example is this very page. It has an H1 tag surrounding the question title. This gives google a hint that the page is about "algorithm", "analyzing", "web pages", etc.
The hard part is to determine context.
In our example here, the term "pages" is very generic and can relate to anything. However "web pages" are a bit more specific. You can do this with an internal dictionary that is built up over time based on term frequency after analyizing a number of documents to find commonality. The frequency should provide a weighted value in determining the top X "tags" for a given page.
This is more of an Information Retrieval and Data Mining question. Reviewing some of Rao's lectures may help.
When you're spidering web pages, you're essentially trying to build an index. You do this by building a global Term-Frequency dictionary, where each word in the language (often stemmed to account for pluralization and other modifications) is stored as a key, and the number of times they occur in the document as values.
From there, you can use algorthms such as PageRank and Authorities and hubs to do data analysis.
You can implement a number of heuristics:
Acronyms and words in all uppercase
Words that are not frequent, i.e. discard words that appear in all or most documents and favour the ones that appear relatively frequently only on this one.
Sequences of words that always appear in the same order in this document and possibly in others as well
etc.
I'm looking for an algorithm (or some other technique) to read the actual content of news articles on websites and ignore anything else on the page. In a nutshell, I'm reading an RSS feed programatically from Google News. I'm interested in scraping the actual content of the underlying articles. On my first attempt I have the URLs from the RSS feed and I simply follow them and scrape the HTML from that page. This very clearly resulted in a lot of "noise", whether it be HTML tags, headers, navigation, etc. Basically all the information that is unrelated to the actual content of the article.
Now, I understand this is an extremely difficult problem to solve, it would theoretically involve writing a parser for every website out there. What I'm interested in is an algorithm (I'd even settle for an idea) on how to maximize the actual content that I see when I download the article and minimize the amount of noise.
A couple of additional notes:
Scraping the HTML is simply the first attempt I tried. I'm not sold that this is the best way to do things.
I don't want to write a parser for every website I come across, I need the unpredictability of accepting whatever Google provides through the RSS feed.
I know whatever algorithm I end up with is not going to be perfect, but I'm interested in a best possible solution.
Any ideas?
As long as you've accepted that fact that whatever you try is going to be very sketchy based on your requirements, I'd recommend you look into Bayesian filtering. This technique has proven to be very effective in filtering spam out of email.
When reading news outside of my RSS reader, I often use Readability to filter out everything but the meat of the article. It is Javascript-based so the technique would not directly apply to your problem, but the algorithm has a high success rate in my experience and is worth a look. Hope this helps.
Take a look at templatemaker (Google code homepage). The basic idea is that you request a few different pages from the same site, then mark down what elements are common across the set of pages. From there you can figure out where the dynamic content is.
Try running diff on two pages from the same site to get an idea of how it works. The parts of the page that are different are the places where there is dynamic (interesting) content.
Here's what I would do after I checked the robots.txt file to make sure it's fine to scrap the article and parsed the document as an XML tree:
Make sure the article is not broken into many pages. If it is, 'print view', 'single page' or 'mobile view' links may help to bring it to single page. Of course, don't bother if you only want the beginning of the article.
Find the main content frame. To do that, I would count the amount of information in every tag. Now, what we're looking is a node that is big but consists of many small subnodes.
Now I would try to filter out any noise inside the content frame. Well, the websites I read don't put any crap there, only useful images, but you do need to kill anything that has inline javascript and any external links.
Optionally, flatten that into plain text (that is, go into the tree and open all elements; block elements create a new paragraph).
Guess the header. It's usually something with h1, h2 or at least big font size, but you can simplify life by assuming that it somehow resembles the page title.
Finally, find the authors (something with names and email), the copyright notice (try metadata or the word copyright) and the site name. Assemble these somewhere together with the the link to original and state clearly it's probably fair use (or whatever legal doctrine you feel like applies to you.)
There is an almost perfect tool for this job, Boilerpipe.
In fact it has its own tag here though it's little used, boilerpipe. Here's the description right from the tag wiki:
The boilerpipe library for Java provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The source is all there in the project if you just want to learn the algorithms and techniques, but in fact somebody has already ported it to C# which is quite possibly perfect for your needs: NBoilerpipe.
BTE (Body Text Extraction) is a Python module that finds the portion of a document with the highest ratio of text to tags on a page.
http://www.aidanf.net/archive/software/bte-body-text-extraction
It's a nice, simple way of getting real text out of a website.
Here's my a (probably naive) plan of how to approach this:
Assuming the RSS feed contains the opening words of the article, you could use these to locate the start of the article in the DOM. Walk back up the DOM a little (first parent DIV? first non-inline container element?) and snip. That should be the article.
Assuming you can get the document as a XML (HtmlAgilityPack can help here), you could (for instance) grab all descendant text from <p> elements with the following Linq2Xml:
document
.Descendants(XName.Get("p", "http://www.w3.org/1999/xhtml"))
.Select(
p=>p
.DescendantNodes()
.Where(n => n.NodeType == XmlNodeType.Text)
.Select(t=>t.ToString())
)
.Where(c=>c.Any())
.Select(c=>c.Aggregate((a,b)=>a+b))
.Aggregate((a,b)=>a+"\r\n\r\n"+b);
We successfully used this formula for scraping, but it seems like the terrain you have to cross is considerably more inhospitable.
Obviously not a whole solution, but instead of trying to find the relevant content, it might be easier to disqualify non-relevant content. You could classify certain types of noises and work on coming up with smaller solutions that eliminate them. You could have advertisement filters, navigation filters, etc.
I think that the larger question is do you need to have one solution work on a wide range of content, or are you willing to create a framework that you can extend and implement on a site by site basis? On top of that, how often are you expecting change to the underlying data sources (i.e. volatility)?
You might want to look at Latent Dirichlet Allocation which is an IR technique to generate topics from text data that you have. This should help you reduce noise and get some precise information on what the page is about.