I am making an webcrawler in C# which needs to find webshops. The problem i'm having is that I need to detect if the webpage is a webshop. If it is I need to find out what type of e-commerse software it is using. But the problem is that I don't know how you can detect it in the source code.
I also have just a Chrome plugin called builtwith which can detect all kinds of software. But I have yet to find out how they are doing that.
It would be nice if someone could help me with this problem
Before giving you an actual answer, it's worth noting that what you're proposing could be in violation of the terms of use for many websites out there. You should take the time to investigate what legal liability you might be exposing yourself and your organization to.
This is going to be a lot of time consuming work, but it's not difficult. Your crawler is just going to need to simply work using a rules-based approach to detect signatures in the payload of the page.
Find the specific software that you're intending to detect.
Find 2-3 sites that are definitely using the software.
Review the HTML payload to see what scripts, CSS, and HTML patterns they have that are common across the sites.
Build a code-based rule that can detect those patterns consistently. For example: if (html.Contains("widgetName")) isPlatformName = true;
Test that patterns across more sites that you know for certain are using that software.
Repeat for each software vendor.
The more complicated thing will be when the targets have multiple versions and you need to adapt your rules to know and be aware of the various versions, or when platforms are very similar.
I think the most complicated part of this is having a well-thought-out bot issue detection, reporting, and throttling architecture in place. You should probably spend the bulk of your time planning that.
That's it.
There are a couple different ways to determine the technologies a site is using. Firstly, if you are technically savvy, you can right click on an eCommerce page (either catalog, checkout page, etc) and look at the source code. Many platforms will have hints in the source code that will give you an idea what the site is running.
You can also look at the DNS/hosting information, which would help you determine if the eCommerce solution is hosted or SaaS (like Shopify, for example).
You can also try using InterNIC and enter the domain name. The results will return the nameservers which could point you in the right direction.
Finally, if all that sleuthing seems too difficult, there’s an easier way! Try BuiltWith. It’s generally pretty reliable, as long as the system you're looking up isn’t custom/proprietary. Enter a domain into BuiltWith and it will show you the platform, widgets used, analytics and tracking codes, CDNs, CMS, payment processors, and more.
Related
I have a pawnshop CRUD app written 20 years ago with INFORMIX-SQL/SE (DOS) which is currently running on DOS 6.22 within Microsoft Virtual PC 2007 on Windows Vista. I would like to modernize this app with a GUI, SQL-based engine and retain its existing functionality. It doesn't require any networking or multi-user capability. I would prefer a product which is royalty-free.
I also would like to quickly re-write it with as little effort possible. Which tool would you recommend?
I'm debating whether to re-write my INFORMIX-SQL app with I4GL (character-based) or another Windows/GUI-based tool.
My app is very robust and has some incredible features which my users are very happy with. Only obstacle which is keeping me from effectively acheiving market penetration is, believe it, my app is char-based and I would like to duplicate the same functionality with a GUI. My feeling is that its quicker for a user to process a transaction with my char-based app vs. having to focus a cursor with a mouse, but cosmetics is hurting me!
I would like to know specific instances of limitations, bugs or drawbacks of using another development tool before I invest considerable amount of time evaling another product. Answers to this question could save me a lot of time and money!
If you visit www.frankcomputer.com you can view a video-demo of my pawnshop app. (CAVEAT: The website's in Spanish, use google translate to get a more-or-less decent translation of the text. Start the video at the two-minute mark, with 720p resolution and full-screen to best comprehend my app.)
If I were doing it, I would probably choose to write a WPF GUI in C# with a SQL Server Express backend database. An embedded database like SQLite might work as well. But the main reason I would choose that is because that's what I'm most familiar with. Someone else would likely choose something else...
I might also choose ASP.NET MVC and make it a web application if that were an option (you say that multi-user is not required, but I say it's not required yet).
Also, if you're not the one who's going to be developing it (i.e. you're going to hire someone to build it for you) then I would say that you should find the developer first and let them choose (or at least have a say in) the technology. If you choose the technology up-front then you're simply limiting the field of developers who'll be able to work with you and there's really not much point in that.
I'd recommend you use Python with a PostgreSQL backend. Now some will think this is overkill, but after watching your video and reading your site (I had to use a translator), I suspect the added flexibility is something you will truly enjoy by going this route.
The reasons I'd argue for this solution are:
Python and PostgreSQL are both great products with amazing communities when you need them.
Both products have a bright outlook in their development paths. Since you obviously spent a lot of time and effort tweaking SPACE, I'm betting you will do the same over the next 40 years. So, the tools you choose now need to be there for you as you continue your development cycle.
They are both free with friendly licenses.
Cross-platform support.
Scalability. You can use PostgreSQL installed locally and connect via socket or scale it all the way up to several servers using load balanced connection pooling.
Security.
Data integrity. This includes how easy it is to make your whole environment easy to backup and thus easy to restore in the event of a catastrophe.
Whatever tools you end up choosing. I wish you the best in this project. I can tell you are working on something you truly love and that is something more of us should strive for!!
Based upon your answers and your emphasis upon time to make the changes and that you don't seem to want to change the Application at all but it is being forced upon you by then you should certainly evaluate Genero from 4js.
This will allow you to utilise your existing code but provide a nicer looking front-end. You can also maintain a single codebase supporting both character and "Gui" clients.
Choose whatever language and technology is easiest for you. If you need DB access and a short lead time it sounds like Java or Visual Basic would be best. Both have plenty of free tools to get you started.
The top languages tags in StackOverflow are C# (by a long margin), then Java, PHP and DotNet, followed by C++ and Python. Some of that will be skewed by the Joel & Jeff origin of the site, but any of those is more than capable of the task. Personally, I'd go with Java or Python but I don't like being tied to the Microsoft stack.
wxWidgets and QT might be options for the GUI components.
Of the databases, mysql, SQL Server Express or Oracle Express Edition are all free and robust. SQLite is good enough for most single user applications though. I'd put this at the bottom of the 'importance' list. For small-scale single user apps, you should be able to chop and change DB platforms without much hassle. The biggest relevance would be in how you actually backup/copy/restore data in the event of disk failure or corruption.
I will be taking on the role of support for a complex application that is transitioning from the development team. This application is a sharepoint solution that connects to several (7) web services. The development team is rolling off almost immediately and will be available only for small questions.
I'm new to this role so I'm wondering what suggestions you have for me as I take on this large project. What are some considerations that should be made so that the transition to support is smooth and uninterupted?
I've been reading the documentation but I can already see some gaps that need to be filled. The applicaiton is very (perhaps overly) configurable and there is lots of injected code. Stepping through the code is about the only way I can gain an understanding of what is actually happening.
It sounds like you've already got your environment set up if you're able to debug the application, so that's the first thing I was going to suggest in a knowledge-transfer situation. Some general things that I would get from the developers before they depart:
A list of third-party components that the application uses, along with license information and website logins if applicable.
Access to every part of the environment that this thing runs on, both production and development. That means the source code management system, database server(s), etc. It sounds like you have some of these already but make sure you get access to absolutely everything.
If your development environment was given to you "as is" (i.e. you took it over from one of the departing developers, make sure you know how to rebuild it from scratch. They might have a document that describes the process of building a development box, but if not maybe you can get them to show you how to set up a fresh machine.
Three will go a long way towards this, but if setting up a server to run the application is different in any way from setting up a development environment, you'd want to know how so you can diagnose server configuration issues if they crop up, or even rebuild a server. Although this sort of thing may be someone else's responsibility depending on your organization.
Once you have those, you probably want to get some understanding of why the application does the things that it does. That will give you the context you need to understand support and enhancement requests when they come in.
Are the original developers the only source of this information, or are there business people who you will be working with after the developers leave? One of the first things I try to do when starting on an existing application that's new to me is to find someone who knows the business well and have them give me a high-level run-down of the application's purpose in life. From there you can go into more detail on individual components/features/whatever as needed. The business people may be a better source for this information than the developers are, so you may want to try them first.
Hopefully some of that helps.
If you're not the systems admin (as opposed to the SharePoint admin), develop an understanding with them of what tasks you are able to do and what you need of them.
This may include things like stopping and starting services (IIS, Timer Service, etc.) and filesystem and DB monitoring and maintenance. Getting this sorted out up front saves a lot of pain later.
If the sys admins don't have some understanding of SharePoint, educate them. They will need to know what the deal is with things like code deployments.
It's best not to feel my pain.
I have already designed an applications that is nothing more than a simple WinForm with one or two classes to handle data and collection.
Fairly often I find myself refactoring parts of it or adding new features to it, not huge features but small additions to its functionality.
The question I have is what would be the best way to provide an updated program to the user after they have initially downloaded it.
I have thought of a few different options already:
Upload a new version with improvements on CodePlex
Host the application on my personal website but change the file with the latest version
Implement some sort of system that will work in a way similar to add-ons to add the functionality.
Is there a way to provide an updated application without the user having to essentially replace their current version by deleting it and replacing it with a newly downloaded one? Although the CodePlex idea seems worthwhile I wasn't sure if there was a better or easier way.
Thank you for your time.
This is what ClickOnce was designed for.
I've used it regularly in a corporate setting,but it would also be appropriate for an Internet deployment scenario. You may want to invest in a certificate so you can sign your code if this is a commercial product.
Added
Here's a shorter article with a lot of screen shots.
http://www.15seconds.com/issue/041229.htm
(Still looking for more good links).
Added - final addition
Wikipedia sums it up succinctly.
http://en.wikipedia.org/wiki/ClickOnce
I am building a prototype for a web-based application and was considering building the front-end in HTML, which can then be reused later for the actual application. I had done a Flash-based prototype earlier, which embedded the .swf into a C# executable. Flash made for rapid turnaround time while the Windows application provided unlimited access to fancy API's for DB access and sound.
I want to consider something similar for this one too. Does this approach make sense? I am particularly concerned about the way the HTML would communicate with the container app. From what I understand out of preliminary research, it would be only through JavaScript, which might quickly get unwieldy. This is especially so because unlike the Flash-based prototype which implemented a lot of its functionality in the .swf, the HTML UI will depend entirely upon the shell to maintain state. Also, I don't need anything more than access to a database. So a desktop application might be overkill.
Another alternative that comes to mind is to build the prototype using PHP and deploy it with a portable server stack such as Server2Go or XAMPP. But I've never done something like this before. Anybody here shed some light on drawbacks of this approach?
The key requirement is rapid iterations of the UI, reusable front-end code and simplified deployment without any installations or configuration.
Some of the best programming advice I've seen came from Code Complete, and was along the lines of, "evolutionary prototypes are fine things, and throwaway prototypes are fine things, but you run into trouble when you try to make one from the other." That is, know which type of prototype you're developing, and respect it. If you're developing a throwaway prototype, don't permit yourself to use any of it, however tempting it may be, in the production system. And if you're developing an evolutionary prototype - one intended to become the production system - don't compromise quality in any way.
It sounds like you're trying to get both, the rapid development of a throwaway and the reusability of an evolutionary prototype - and you can't. Make up your mind, and stand by it. You can't have your cake and eat it, too.
I think you off to the wrong start, here. Why would you want your prototype to be fully functional? A prototype is intended to be throw-away and to help flesh out requirements and UI. If you need full functionality, why not just skip to the final product? If prototyping is really something you want to do, I suggest looking into a specialized prototyping tool.
Are you prototyping the user interface for a customer? If you are, consider something less unwieldy like paper prototypes or presentation software (like PowerPoint) until you get the UI nailed down. If you can establish the UI and are clear about the customer's requirements, you can then develop the application in whatever the actual platform is going to be with a clear model in mind.
In my current project, I prototyped the UI in PowerPoint first. In a subsequent iteration, I used static web pages and some jQuery plugins to simulate actual user interaction. That proved to be very effective in demonstrating the interface, and I didn't have to build the application first.
I would join in on folks suggesting paper prototyping as the "idea", but not necessarily the implementation. The biggest point here is that tools such as HTML or Flash let you get "bogged down" in the details - what does this color look like? What's the text on this thing? Lots of time can pass by that way. Instead, what you should be focusing on is user flows.
One tool that keeps the spirit of paper prototyping without all the "paper" drawbacks is Balsamiq: http://www.balsamiq.com/demos/mockups/Mockups.html. It was covered by Jeff and Joel in one of the Stack Overflow podcasts; I've been using it for my own projects for a while. It's freeware, and it does its job magnificently.
If you know C# then another option you can look at is Silverlight. You can then leverage your knowledge of C# and/or JavaScript and interact with a rich object model.
Would that do what you are looking for? The installation would be minimal on the part of the client - download and install the Silverlight plugin
If prototyping is something you truly wish to accomplish here, paper and pencil will be your best friends. You can draw out as many iterations as necessary. While none of this is ultimately useful later on once you begin coding, it is as quick and rapid is it goes.
As mentioned previously, there are many prototyping tools which have a bit of a learning curve, but an alternative to consider would be using a framework such as CakePHP or Ruby on Rails which make for fast application logic and leave customizing the front end being the primary hard work left. And plus, you're left with a mostly functional application when you're done with your prototyping which can be tweaked as needed.
In either scenario, you're paying with your time either upfront (in the case with learning a new framework), of over time in payments (with the case of prototyping on paper or coding by hand).
Windows, Firefox or Google Chrome all monitor usage statistics and analyze the crash reports are sent to them. I am thinking of implementing the same feature into my application.
Of course it's easy to litter an application with a lot of logging statement, but this is the approach that I want to avoid because I don't want my code to have too many cross cutting concern in a function. I am thinking about using AOP to do it, but before that I want to know how other people implement this feature first.
Anyone has any suggestion?
Clarification: I am working on desktop application, and doesn't involve any RDBMS
Joel had a blog article about something like this - his app(s) trap crashes and then contact his server with some set of details. I think he checks for duplicates and throws them out. It is a great system and I was impressed when I read it.
http://www.fogcreek.com/FogBugz/docs/30/UsingFogBUGZtoGetCrashRep.html
We did this at a place I was at that had a public server set up to receive data. I am not a db guy and have no servers I control on the public internets. My personal projects unfortunately do not have this great feature yet.
In "Debugging .Net 2.0 Applications" John Robbins (of Wintellect) writes extensively about how to generate and debug crash reports (acutally windbg/SOS mini dumps). His Superassert class contains code to generate these. Be warned though - there is a lot of effort required to set this up properly: symbol servers, source servers as well as a good knowledge of VS2005 and windbg. His book, however, guides you through the process.
Regarding usage statistics, I have often tied this into authorisation, i.e. has a user the right to carry out a particular task. Overly simply put this could be a method like this (ApplicationActions is an enum):
public static bool HasPermission( ApplicationActions action )
{
// Validate user has permission.
// Log request and result.
}
This method could be added to a singleton SercurityService class. As I said this is overly simple but should indicate the sort of service I have in mind.
I would take a quick look at the Logging Application Block that is part of the Enterprise Library. It provided a large number of the things you require, and is well maintained. Check out some of the scenarios and samples available, I think you will find them to your liking.
http://msdn.microsoft.com/en-us/library/cc309506.aspx