Is there any Url Spider component that I can reuse? - c#

I'd like to build a spider tool using which I can run it against a website root url then it should find all the broken and healthy links (images, css, .aspx, .docs) by specifing the degree of parsing e.g. 2 levels.
Then at the end it should generate a map of the result either in Xml or in DataTables
Is there any ready third party or free tool that I can reuse in my .NET application?
Many thanks,

I've used the Chilkat ASP.Net Spider component before. It allows you to specify the number of levels to crawl, and allows you to set up exclusions etc.
It won't generate a map for you automatically (or at least the version I worked with didn't), but logging the results to either a database or XML should be fairly easy. Details on the component can be found here, and you can download the component for free from here

Related

What's the best way to create a replication/load balancing module?

We have a sharepoint doucment library, the site consist media files(like images, word document, .psd file) and then we have a local CME (Alterian) which can be integrated to the SharePoint library in order to share the document library but the site needs to be on http// not an https//, coincidentally current sharepoint site is on https//, so we need to figure out a way/write a module which will work as a scheduled job (possibly using SPJobDefination class) and check on https// site for recently modified/added or deleted documents/records and then will copy them/normalize them to a dev site (hosted on http//, replica of the production https// site).
Experts please share your view's to proceed with a best approach to make this happen. (At an initial stage I'll have to copy over all the existing meta-data from the current https// site aswell)
Thanks a lot in advance for the time.
I would use event handlers on the https document library. Please see the SPItemEventReceiver.ItemAdded Method and SPItemEventReceiver.ItemUpdated Method.
So, every time you will add or modify an item, the code inside the methods is triggered. Inside the code, you may take the library document and copy it to the http site.
Regarding the existing items, you could write a simple console application which will copy the items from one list to the other.
Make sure that you make use of the SPListItem.SystemUpdate Method.
Also, the following excerpt from an answer to the question Moving Documents from library to library deletes version history, how do you retain it? could be helpful for starting:
(...) We can get the “SPFile” and the “SPFileVersion” objects from the
original library and add them to another library one by one. After
copying a file or version, get the original custom property form the
source file or version and use the “SPListItem.SystemUpdate(false)”
method to update the target file or version. This workaround can
persist most of the properties except the “modified time” or “modified
by” field. (...)

Best way to collect information from web page

I need to get information from couple of web sites . For example this site
What would be the best way to get all the links from the page so that the information could be extracted.
Some times need to click on a link to get other links inside that.
I tried Watin and I tried doing the same from within Excel 2007 with Web Data option.
Could you please suggest some better way which I am not aware of .
Ncrawler might be very useful for the deep level crawling . You could also set the MaxCrawlDepth for specifying the same.
Have a look at WGet. It is an incredibly powerful tool for mining the content of a single page or an entire website. The options available allow you to dictate how many levels deep to follow in terms of links, what to do with static resources such as images, how to handle relative links, etc. It also does a very good job of mining pages which are generated dynamically, such as those served by CGI or ASP.
It's been around for many years in the 'nix world but executables compiled for Windows are readily available.
You would need to kick it off from .NET using Process.Start but you could then pipe the results into multiple files (which mimic the original website structure), a single file, or into memory by capturing standard output. Then you can do subsequent analysis such as extracting HREF HTML elements (if it is only links you are interested in) or grabbing the sort of table data evident in the link you provide in your question.
I realise this is not a 'pure' .NET solution but the power WGET offers more than compensates for this, in my opinion. I have used it myself in the past, in this way, for exactly the sort of thing I think you are trying to do.
I recommend to use http://watin.org/. This is much simpler than wget :-)

Translating comments and region names in source code

Does anyone know of a batch processor or a VS 2010 plugin/script that would let me translate comments and region names from Chinese into English?
The only ones I've found either process all strings or only one string at a time.
I have two large C# projects that I am trying to read through.
Thanks.
Use PrepTags to prepare your file for translation. It will allow you to select the text to be translated based on regex.
www.preptags.com
You can work file by file for free, or process the files as batch using the pro version (€39)
In your case, it's pretty simple to prepare. You just mark everything as protected, then unprotect the content of the comments & region names.
Disclosure: I develop PrepTags.
As was noted, you can use Google Translate API or alternatively Bing Translator API. You can detect comments and regions in your files using System.CodeDom.
I'm not too sure if this is possible. What you can do to help would be the following:
1) Make sure that both C# projects have the Properties > Build > Xml document file check box checked.
2.1) Write an application that reads in the generated xml file.
2.2) Parse the file, and for each value make a call to Google Translate to get the translated value.
2.3) Place the translated value within another xml file that has the same structure as the one created from building the project.
This wouldn't solve the your desire to translate the region names, but its a start. At least you would have intelligence when using the two projects.
This is actually a good idea for a small open source project. I may decide to pick it up. If I do, I'll let you know.

Flash plugin that allows users to create a PDF or image file

I am developing a ASP.NET web site where users will need to be able to create their own business cards. So, I'm looking for a tool (most likely Flash) that I can easily integrate into a web site and lets users add text and custom images to their cards and then create an image and/or PDF from their work.
is there a plugin that does this?
If you write your own business-card creator in flash, you can save the view to PDF files using AlivePDF.
You could have a look at this: http://www.shirtnetwork.com/en
It is less of a plugin and much more of a fully customizable software solution, with an administration backend, PDF export, billing etc. . I worked on the client and I must say it is a very mature and potent software and probably can do about anything you want, when it comes to customizing products. I don't know, whether you like the pricing model, OTOH to my knowledge, they also provide provision-free licences.
I don't know, whether there are reasonbly expensive components available, that do this for you, because you can get a load of money out of this business, so I wouldn't expect anyone to give them away for free.
greetz
back2dos

Are there any performance issues or caveats with resource (.resx) files? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Resource files seem great for localization of labels and messages, but are they perfect?
For example:
Is there a better solution if there is a huge amount of resources? Like 100,000 strings in a .resx file? (Theoretically, I do not actually have this problem)
Is this a good method for storing the other types of data, such as images, icons, audio files, regular files, etc.?
Is it a best practice to store your .resx files in a stand-alone project for easier updates/compiling?
Are there any other issues that you have run into when using .resx files?
1. Is there a better solution if there is a huge amount of resources? Like 100,000 strings in a .resx file? (Theoretically, I do not actually have this problem)
I've used Alfresco as an alternative content repository on Java projects. RESX files, from a maintaince standpoint (because of encoding issues I guess) can really stink.
2. Is this a good method for storing the other types of data, such as images, icons, audio files, regular files, etc.?
I've seen it work with images...but that's it. (not sure with other media/files)
3. Is it a best practice to store your .resx files in a stand-alone project for easier updates/compiling?
I don't, but you can edit a resx file on a live site and then edit will go through, I believe. Certainly that's the way it works in development (except for the global resx, I think)
4. Are there any other issues that you have run into when using .resx files?
Besides being really annoying to maintain, and the fact that visual studio doesn't provide the neatest tools for working with them...no.
I recently used a .resx file with 5 million strings (normal length, like this sentence), compiled in different DLLS about 1 GB in size. It still works fine in an Azure web project.
The load time is unknown, maybe few seconds or so, since it always can heat up in stages, I never noticed it.
We have been using resource files on a relatively large .NET Windows Forms application (over 500 various forms, approximately 20 resource strings per form) and we've had no performance issues regarding resources from .resx files.
We have used Babylon.NET as a tool for managing translations (has a free version just for translators).
You did not specify if your project will be web or desktop application. One functionality that resource files offers for desktop applications is the ability to also localize control positions and size which IMHO is not possible using other tools (unless you are using something like DevExpress layout control which has automatic sizing).
Never seen any problems with resx resources, they are being cached perfectly. We have used them in WinForms, asp.net mvc, wpf, etc...
One thing you should do is use the Microsoft MAT (Multilingual App Toolkit) extension for Visual Studio.
You can control your translations, export them to send to translators (e.g. not the locked translations), import them again and verify them or comment on it, recycle existing translations (saving you a lot of time!)
and it works with the industry standard xlf format!
If you sign up with the Azure api you can even automatically translate resources (you have a few thousand words free of monthly credit on azure).
See: https://multilingualapptoolkit.uservoice.com/knowledgebase/articles/1167898-microsoft-translator-moves-to-the-azure-portal
you can even see how much work already has been done in a project:
Oh and it comes with a handy editor which your translators can also use!
To get started:
install the MAT Visual Studio extension
Go to your project in Visual Studio
Click Properties --> open AssemblyInfo.cs
Add this attribute: [assembly: System.Resources.NeutralResourcesLanguage("en")]
Select your project in Solution Explorer and to in Visual Studio to [Tools] --> [Multilingual App Toolkit] --> [Enable selection]
This will add a new folder "MultilingualResources" to your project
Right mouse click your project --> [Multilingual App Toolkit] --> [Add translation languages…] --> select the language you want to translate (e.g. Dutch).
In the "MultilingualResources" folder you will see a new file "....nl.xlf", double click it, it will open with the Multilingual Editor. (if not right mouse click and change the default "Open With" to the multilingual editor)
Now you only add strings to your default Resources.resx file (the language should be same as the "NeutralResourcesLanguage" you added in AssemblyInfo.cs.
For the translations you DONT add strings to the ...nl.resx files, you work with the .xlf files, located in the MultiLingualResources folder.
(after you have done lots of translations, a rebuild might be needed so that the translated .xlf files update the translated .resx files)
Where to get it:
feedback: https://multilingualapptoolkit.uservoice.com/
visual studio extension: https://marketplace.visualstudio.com/items?itemName=MultilingualAppToolkit.MultilingualAppToolkit-18308
knowledge base: https://multilingualapptoolkit.uservoice.com/knowledgebase
github of Cameron (Microsoft) who manages this project: https://github.com/TheMATDude
I have had two problems with resource files, both about performance of the translators (people), rather than the speed of string lookup.
The sales staff at the oversee office
that did the translators could not
cope with editing XML or learning any
new tool.
So they just used Excel to edit the translations. Therefore, we might as well have stored the translated strings as a CVS file, so avoiding having to copy the translated strings into the resource files.
A new build needs to be done so as to
see the effect of any translations.
Once again if the translated strings were stored as a CSV file, we could have cached them in the ASP.NET cache. Then any changes to the translations would show up on the next page load.
So we could have used a custom implementation of the resource provider and keep to the standard ASP.NET resource lookup system. Or just ignore the standard resource lookup system if it does not help in your case – it depends on how your pages are written.
You may find at some point that you wish to be able to override strings for a single customer, if so you will need a multi-stage lookup system. Otherwise, you have to merge the customer’s custom strings with the translated string each time you ship a new version of the system.
For point#4.
I have been using .resx files for all strings on our site that must be localized into many languages and haven't had any major issues with them.
The one thing that you need to think about is if you want this text to be searchable. For some of the sites I work on there are some localized resources that need to be searchable so I must keep them in the database. However, when I have the choice I prefer the .resx file for similar reasons mentioned above.
I will simply add that you should look for custom implementations (or do you own) of the resource provider (provider model like the membership provider) to store your resources in a database. That's what we did for our CMS, and it's very useful.
When we first looked for an example back then we found Creating a Data Driven ASP.NET Localization Resource Provider and Editor.
Here is my take on resource files:
I would assume that if there is a LARGE amount of string, that using a database might be the best method to allow for searching and sorting of the data. It would probably not be too difficult to account for multiple languages in a resource table, and the speed should best fast.
I would think that this is a good method for storing static resources, or things that might be changed by a client. As for dynamic resources, it might be better to use a database, either alone, or in conjunction with the file system. I think in the new SQL Server there is a new type that is an optimal hybrid of using a database and the file system.
I read in another question (don't know which) that using resource files in an external project is a good practice, because you wouldn't have to recompile an entire project when resources change. Just recompile the resource project. This would also allow for (fairly) easy edits to be made by clients, where they would only need to "source code" for the resource project, and not your other real code (API code, etc.).
I have not used resource files enough to make any claims about their reliability, extensibility, or any potential issues that you might have when working with them.
I've been using resource files in a .net razor page app after dumping our previous proxy server that used a custom regular expression language to replace strings as they passed through the proxy.
We dumped the proxy method as it was more suited to large strings (paragraphs) and pretty awkward for all the dynamic fragments and stuff we had.
Had no problem at all and it's faster so far than the proxy server. I store all the target pages, comments, names, en and all other available languages in a DB..trivial to add a new column for a new language.
We have about 5k entries in multiple resx files so far
I then use a builder process to create all the resx files and place them in the correct local and global folders any time something is updated.
Dead easy to build a simple interface for translators to search for pages, languages, comments, names etc and update. We choose not to auto rebuild the resx files on a change but you could if you trust your translators ;)
We also allow translators to add new fragments/text to translate but as yet we've not had any bright ideas on how to include them automatically and have to manually substitute the string in the source file and recompile.
For editing resx files I've used zeta,
https://www.zeta-resource-editor.com/index.html
Which can open all your languages in one go and highlights differences in placeholders and also missing translations. You can edit all the languages on one row and save all the files in one go. We don't use it now as everything is in the DB but recommend it.

Categories