I am currently working on a project to retrieve mp3 files from the audio of YouTube videos.
I started my project with some basic parsing of youtube links and came up with a regex to extract the id.
private static string GetYoutubeID(string url)
{
var match = youtubeRegex.Match(url);
if(match.Success)
{
return match.Groups[0].Value ?? throw new ArgumentException("url");
}
else
{
throw new ArgumentException(url);
}
}
Then i am using the id in a request like so:
public static string GetResponse(string id)
{
using (WebClient client = new WebClient())
{
return client.DownloadString($"http://youtube.com/get_video_info?video_id={id}");
}
}
So now this call returns a string with tags like "status" and "errorcode". There is also one particular tag named "url_encoded_fmt_stream_map". In it I can find the urls to download the videos in various qualities.
For Exampe:
https%3A%2F%2Fr2---sn-h0jeened.googlevideo.com%2Fvideoplayback...
But my query seemed to fail and the result delivered me an empty "url_encoded_fmt_stream_map"-tag with "status=fail" and the "errorcode=150" which is "video unavailable".
From my understanding the libraries out there to help downloading youtube videos have some similar problems.
So my question is, is my method reliable? If so, what am I doing wrong?
And if it is not reliable, how can i reliably retrieve a link to a video stream?
I was able to fix my problem. After some more trying and searching I was able to change the request url to achieve a result.
For anyone interested:
https://www.youtube.com/get_video_info?video_id={id}
Can be changed to:
https://www.youtube.com/get_video_info?video_id={id}&el=detailpage
This returns a full answer for me and the error 150 does not occur anymore for me.
Related
I have been trying to read/write metadata of mp3 files.I searched around a bit and found out i could use Taglibsharp in order to do it.Reading the metadata works perfectly however when i try to write in order to change a tag for example the title nothing happens.Also there is no documentation for taglibsharp so if someone who has previously used it knows what i can do , please help me.Thanks in advance!
Some code:
var tfile = TagLib.File.Create(songPath);
//Reading: works perfectly
string title = tfile.Tag.Title;
//writing: does nothing
tfile.Tag.Title = "A title";
You need to call the .Save() method:
using (var tfile = TagLib.File.Create(songPath))
{
tfile.Tag.Title = "A title";
tfile.Save();
}
Is there an API to use Onenote OCR capabilities to recognise text in images automatically?
If you have OneNote client on the same machine as your program will execute you can create a page in OneNote and insert the image through the COM API. Then you can read the page in XML format which will include the OCR'ed text.
You want to use
Application.CreateNewPage to create a page
Application.UpdatePageContent to insert the image
Application.GetPageContent to read the page content and look for OCRData and OCRText elements in the XML.
OneNote COM API is documented here: http://msdn.microsoft.com/en-us/library/office/jj680120(v=office.15).aspx
When you put an image on a page in OneNote through the API, any images will automatically be OCR'd. The user will then be able to search any text in the images in OneNote. However, you cannot pull the image back and read the OCR'd text at this point.
If this is a feature that interests you, I invite you to go to our UserVoice site and submit this idea: http://onenote.uservoice.com/forums/245490-onenote-developers
update: vote on the idea: https://onenote.uservoice.com/forums/245490-onenote-developer-apis/suggestions/10671321-make-ocr-available-in-the-c-api
-- James
There is a really good sample of how to do this here:
http://www.journeyofcode.com/free-easy-ocr-c-using-onenote/
The main bit of code is:
private string RecognizeIntern(Image image)
{
this._page.Reload();
this._page.Clear();
this._page.AddImage(image);
this._page.Save();
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
return null;
}
As I will be deleting my blog (which was mentioned in another post), I thought I should add the content here for future reference:
Usage
Let's start by taking a look on how to use the component: The class OnenoteOcrEngine implements the core functionality and implements the interface IOcrEngine which provides a single method:
public interface IOcrEngine
{
string Recognize(Image image);
}
Excluding any error handling, it can be used in a way similar to the following one:
using (var ocrEngine = new OnenoteOcrEngine())
using (var image = Image.FromFile(imagePath))
{
var text = ocrEngine.Recognize(image);
if (text == null)
Console.WriteLine("nothing recognized");
else
Console.WriteLine("Recognized: " + text);
}
Implementation
The implementation is far less straight-forward. Prior to Office 2010, Microsoft Office Document Imaging (MODI) was available for OCR. Unfortunately, this no longer is the case. Further research confirmed that OneNote's OCR functionality is not directly exposed in form of an API, but the suggestions were made to manually parse OneNote documents for the text (see Is it possible to do OCR on a Tiff image using the OneNote interop API? or need a document to extract text from image using onenote Interop?. And that's exactly what I did:
Connect to OneNote using COM interop
Create a temporary page containing the image to process
Show the temporary page (important because OneNote won't perform the OCR otherwise)
Poll for an OCRData tag containing an OCRText tag in the XML code of the page.
Delete the temporary page
Challenges included the parsing of the XML code for which I decided to use LINQ to XML. For example, inserting the image was done using the following code:
private XElement CreateImageTag(Image image)
{
var img = new XElement(XName.Get("Image", OneNoteNamespace));
var data = new XElement(XName.Get("Data", OneNoteNamespace));
data.Value = this.ToBase64(image);
img.Add(data);
return img;
}
private string ToBase64(Image image)
{
using (var memoryStream = new MemoryStream())
{
image.Save(memoryStream, ImageFormat.Png);
var binary = memoryStream.ToArray();
return Convert.ToBase64String(binary);
}
}
Note the usage of XName.Get("Image", OneNoteNamespace) (where OneNoteNamespace is the constant "http://schemas.microsoft.com/office/onenote/2013/onenote" ) for creating the element with the correct namespace and the method ToBase64 which serializes an GDI-image from memory into the Base64 format. Unfortunately, polling (See What is wrong with polling? for a discussion of the topic) in combination with a timeout is necessary to determine whether the detection process has completed successfully:
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
Results
The results are not perfect. Considering the quality of the images, however, they are more than satisfactory in my opinion. I could successfully use the component in my project. One issue remains which is very annoying: Sometimes, OneNote crashes during the process. Most of the times, a simple restart will fix this issue, but trying to recognise text from some images reproducibly crashes OneNote.
Code / Download
Check out the code at GitHub
not sure about OCR, but the documentation site for onenote API is this
http://msdn.microsoft.com/en-us/library/office/dn575425.aspx#sectionSection1
Is there an API to use Onenote OCR capabilities to recognise text in images automatically?
If you have OneNote client on the same machine as your program will execute you can create a page in OneNote and insert the image through the COM API. Then you can read the page in XML format which will include the OCR'ed text.
You want to use
Application.CreateNewPage to create a page
Application.UpdatePageContent to insert the image
Application.GetPageContent to read the page content and look for OCRData and OCRText elements in the XML.
OneNote COM API is documented here: http://msdn.microsoft.com/en-us/library/office/jj680120(v=office.15).aspx
When you put an image on a page in OneNote through the API, any images will automatically be OCR'd. The user will then be able to search any text in the images in OneNote. However, you cannot pull the image back and read the OCR'd text at this point.
If this is a feature that interests you, I invite you to go to our UserVoice site and submit this idea: http://onenote.uservoice.com/forums/245490-onenote-developers
update: vote on the idea: https://onenote.uservoice.com/forums/245490-onenote-developer-apis/suggestions/10671321-make-ocr-available-in-the-c-api
-- James
There is a really good sample of how to do this here:
http://www.journeyofcode.com/free-easy-ocr-c-using-onenote/
The main bit of code is:
private string RecognizeIntern(Image image)
{
this._page.Reload();
this._page.Clear();
this._page.AddImage(image);
this._page.Save();
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
return null;
}
As I will be deleting my blog (which was mentioned in another post), I thought I should add the content here for future reference:
Usage
Let's start by taking a look on how to use the component: The class OnenoteOcrEngine implements the core functionality and implements the interface IOcrEngine which provides a single method:
public interface IOcrEngine
{
string Recognize(Image image);
}
Excluding any error handling, it can be used in a way similar to the following one:
using (var ocrEngine = new OnenoteOcrEngine())
using (var image = Image.FromFile(imagePath))
{
var text = ocrEngine.Recognize(image);
if (text == null)
Console.WriteLine("nothing recognized");
else
Console.WriteLine("Recognized: " + text);
}
Implementation
The implementation is far less straight-forward. Prior to Office 2010, Microsoft Office Document Imaging (MODI) was available for OCR. Unfortunately, this no longer is the case. Further research confirmed that OneNote's OCR functionality is not directly exposed in form of an API, but the suggestions were made to manually parse OneNote documents for the text (see Is it possible to do OCR on a Tiff image using the OneNote interop API? or need a document to extract text from image using onenote Interop?. And that's exactly what I did:
Connect to OneNote using COM interop
Create a temporary page containing the image to process
Show the temporary page (important because OneNote won't perform the OCR otherwise)
Poll for an OCRData tag containing an OCRText tag in the XML code of the page.
Delete the temporary page
Challenges included the parsing of the XML code for which I decided to use LINQ to XML. For example, inserting the image was done using the following code:
private XElement CreateImageTag(Image image)
{
var img = new XElement(XName.Get("Image", OneNoteNamespace));
var data = new XElement(XName.Get("Data", OneNoteNamespace));
data.Value = this.ToBase64(image);
img.Add(data);
return img;
}
private string ToBase64(Image image)
{
using (var memoryStream = new MemoryStream())
{
image.Save(memoryStream, ImageFormat.Png);
var binary = memoryStream.ToArray();
return Convert.ToBase64String(binary);
}
}
Note the usage of XName.Get("Image", OneNoteNamespace) (where OneNoteNamespace is the constant "http://schemas.microsoft.com/office/onenote/2013/onenote" ) for creating the element with the correct namespace and the method ToBase64 which serializes an GDI-image from memory into the Base64 format. Unfortunately, polling (See What is wrong with polling? for a discussion of the topic) in combination with a timeout is necessary to determine whether the detection process has completed successfully:
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
Results
The results are not perfect. Considering the quality of the images, however, they are more than satisfactory in my opinion. I could successfully use the component in my project. One issue remains which is very annoying: Sometimes, OneNote crashes during the process. Most of the times, a simple restart will fix this issue, but trying to recognise text from some images reproducibly crashes OneNote.
Code / Download
Check out the code at GitHub
not sure about OCR, but the documentation site for onenote API is this
http://msdn.microsoft.com/en-us/library/office/dn575425.aspx#sectionSection1
The following C# code downloads the JSON of a Reddit random page. It is correctly downloading and looping if the value found is not valid. However, the returned string is the same for about a minute straight of checking. Does anybody know if this is a memory problem, a Reddit API problem, or a webClient problem?
string src = "";
while(src.endsWith(<someString>))
{
src = dl(<valid site>);
}
void dl(string st)
{
string json = new WebClient().DownloadString(string);
...
string src = <manipulation of json>;
...
return src;
}
If you're fetching the same reddit url, you are hitting the akamai cache. While you can bypass the akamai cache by visiting api.reddit.com instead of www.reddit.com, your program should respect reddit's API rules, which include, "don't hit the same page more than once per 30 seconds."
I was trying to do something similar to the PHP command get_file_contents to get the market up a webpage, but it has to be in ASP.net. I wasnt sure if i could just open the webpage via file, and then just read the stream into another file.
One thought iw as thinking was: How to read an entire file to a string using C#? but i wasnt sure if that was the route i would be needing to go or if someone else knew something that would do what i wanted.
Looking for the answer to the following pseudocode:
$test = "www.google.com";
$string = get_file_contents($test);
$filePointer = fopen("sample.txt", "w+");
$filePointer.write($string);
fclose($filePointer);
string html;
using (var client = new WebClient())
{
html = client.DownloadString("http://www.google.com");
}
Console.WriteLine(html);
figured this might be what i was looking at: http://www.devprise.com/2006/07/14/c-method-to-mimic-php-file_get_contents/ Not sure though.