Related
Lately, I have developed a keen interest in the speech recognition and natural language processing domain and have been playing around with a few different approaches to build a system which can perform commands based on natural language instructions.
In my study so far, I have come across various NLP tools, but haven't been able to figure out how to utilize them for my purpose.C# is my primary language, and sadly, there is hardly anything available on the dotnet platform for NLP.
In addition to the learning curve, there are various problems with the regular NLP approach as well. Language ambiguity, named entity recognition, sentence boundary detection etc are a few points that add to the complexity. These issues are much more prominent in free form unconstrained language detection and parsing, but for a limited domain, the complexity should be reduced. However, I couldn't really overcome the challenge as most tools have huge static dictionary data or the training process is too complex.
The other major issue is about the conversational approach. Most of the tools do not handle conversational history and have no way to identify the context of the incoming instruction.
I was hoping that some of you guys who have either worked on a similar technology earlier would be able to help me iron out these challenges and point me towards the right direction.
Can you share your experience with various tools, the approaches you took, the roadblocks you faced and how you resolved them during the process.
Update: Let me also include a brief overview of what I envision. The system would essentially be a just a command executor that understands simple english. So, if I say, "send an email to john", it should understand that I want to send an email and now ask me questions to get more information about what should be the subject line and the content. Additionally, if there are more than one Johns in my address book and may be more than one email address for John, the system should be able to identify that too and ask me for further directions.
For the implementation, I think I need following components:
Speech to text converter
NLP engine to parse the text and identify the action and the objects on which the action is to be performed.
An execution engine to create and co-ordinate different agents to perform the different types of actions.
The challenge lies in making the system extensible to be able to support more such actionable features at a later stage with a little modification.
I think I am fine with Speech to text part and execution part. But the pain point is the NLP engine which can understand the natural language correctly and give me exact action and parameters for it.
I have played around with POS taggers. They do not help much with the compound statements, and it gets a little tricky to establish the relationship between various verbs and nouns detected in the sentence.
Another issue is with maintaining the context of previous actions and include it making sense of the current statement.
P.S.: Convert it to a wiki if you feel appropriate. Please don't flame me for asking a generic problem.
You can't uncouple learning, from doing coding from scratch, otherwise is meaningless.
Lisp is probably the best language for natural processing, the classic emacs psychiatrist session is an example of what can be accomplished with little work. Its a scientist's language with functional style of programming, the whole thinking is different than regular c like programming, OOP has no bearing on this. Is the oldest still in use language, a version of it called Scheme is in the MIT introductory course. A classic on the field of AI research.
The problem is you can't easily interface it with your sound input devices, at least most free versions I know of. So you can do the pure logic and awesome of natural language processing in clever ways. But it can't hear you. You can use some stdin/stdout, text file, and database wrappers for independent sound to text and viceversa, but the flux is broken up at uneven levels, and is not natural any longer, because in a natural way of speaking context is lead by understanding of the environment.
If I'm talking with a visiting friend while my AI enabled PC is on and I say: "Just rm -rf everything", how does my machine know who I'm talking about, I silly solution would be that the machine only accept direct addressing: "Computer Do X Y Z", but what if I say to my friend : "In that computer rm -rf everything". Context can only exist through awareness, and Awareness requires some sort of AI.
This is not something to tackle with if/else/then or class hierarchies. Of course at the end of the day lisp machines are coded in C. But is a minimal base, over which you can construct the rest.
So for a mix of learning and "practical approach", you would need to extend a lisp interpreter to add functions to deal with hardware input/output schemes and convert the data back and forwards. This would be trivial and almost secondary on the grand scheme of thigs, but not without considerable effort.
But the most important issue would be the lisp program. You would have to find ways to for example add a intonation property to distinguish stress or eagerness to prioritize an order or to create new context, there is also the problem that people often say non sensical illogical stuff when tired or stressed or by using some local idiom like "chill out", "get a hold of your horses",etc.
A beginner program should not deal with this above example right away, but should be designed to be extensible so it can be able to address such conditions in the future.
Is Easy "do this and that" voice recognition and processing of the order, to contextualized you need a lot of work, both with human issues (language, psychology, culture,etc) and programming Computer Science issues.
An ideal software would be able to use common code to speak both japanese and english, even if grammar and phonetics are completely opposed to each other.
Well this can be almost an exposition of philosophy as well, and can be endless so I'll stop here. I hope this mini essay can be helpful somehow.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I've just inherited a large project previously coded by about 4-5 people. The documentation consists of comments, and is not very well written. I have to get up to date on this project. How do I start? It consists of many different source files. Do you just dig in? Are there tools that can help visualize the structure/flow?
If you have a chance, I'd try and talk to the original designers and developers. Ask them about any major design issues or shortcomings of the project. Is the project in good shape and only needs maintenance or are there major components that need to be added or reworked? What are going to be the biggest roadblocks to maintaining the project? Take one or two of them to lunch (separately) if you have a budget for it as they might be more free to talk about problems outside of the office.
Talking to the users is also important for getting a feel for the current status of the project. Quite often they have a different opinion of how things stand then the developers do. Make sure, however, that they don't start giving you a list of all the things they want added or changed - you should take a few weeks to understand the project before you can start making major changes to it.
As for visualization tools, I'd start with the database design if there is a database involved. Tools like Microsoft Visio can create a diagram from an existing database. I find knowing the design of the database helps me wrap my head around what the programmers were trying to accomplish. Visio is also good for documenting program flow with some basic flowcharts though you'll have to create them yourself - it doesn't generate them automatically as far as I know.
Good luck.
I would encourage you to buy and read this book thoroughly. It provides you a LOT of information in this regard, much more than you will find here.
Brainstorming a little for you:
Step around in the application with a debugger, use a Static Code Analysis tool for which ever language you are working with...
Talk with people - both developers AND USERS to get a feel of the application.
Review the issue tracking system to see if you can see any recurring types of problem...
Are there tools that can help
visualize the structure/flow?
The latest Visual Studio 2010 allows you to generate architecture diagrams.
http://ajdotnet.wordpress.com/2009/03/29/visual-studio-2010-architecture-edition/
Try to find the starting point of the system and start digging from there. It sort of sucks to be in that situation, and chances are the comments might not be that helpful either. If the original developers didn't bother (or didn't have the chance) to document, chances are they never kept the comments up to date with code changes.
So time to bring the shovel... but don't just dig in blindly. One thing that is important is to understand what the system does from a users' perspective.
Concurrent with your code digging, you need to meet with a user (or the users' liason) and have him walk through the system, showing you how it is supposed to be used, for what purpose and what it and its subsystems are supposed to do. Moreover, attempt to understand what are the business pre-conditions and post-conditions of each major operation performed with this system.
Then map (or do a hierarchical) chart of the main functions of the system; classify them by category, purpose or module. If the system performs some sort of work flows or business transactions, attempt to chart some sort of state/transition diagram documenting each (and cross-referencing each state/transition to the subsystem or module in the system that is in charge for it.)
Once you have that, you can dig according to function. It will be best if you dig for a specific purpose, say, there is a bug fix to implement. You locate the logical module or category pertaining to that bug fix, you have the pre-conditions and post-conditions; then you can dig precisely on (or around) that bug fix.
If you just dig in without a guide (at least a high level one), you can be digging for months without getting anywhere (I'm telling you from painful experience.)
If there is no user manual, implement a draft according to your meetings with the users/users' liason. That could serve as a guide for implementing a developer's/administrator's manual for the system you just inherited (if there is ever a chance to implement one.)
If code is not on source control, put it on it. Doesn't matter what SCS you pick (could even be CVS, yuck!) What matters is to put it under source control asap.
Those developers didn't exist in a vacuum, they must have had exchanged emails. Identify other tech liasons they work with. Attempt to identify what other systems, if any, this system interfaces to (.ie. your databases, other's peoples databases, cron jobs, etc.)
But this could come at a later time. I think you should, for starters, focus on understanding how to use the system and what it is for. Let's call it understanding its business/knowledge architecture. Then dig according to that... or better yet, according to that and with the purpose of fixing a bug.
Good luck.
Use Profiler to see main functions and events in your project (the fastest way to learn framework)
Learn business logic very well to better understand the code
Documenting every new thing you learn - setup wiki (you will be surprised how quickly things are forgotten)
You can use Visio to draw Database Model Diagrams. (keep them close to you while studying the code)
These are the things that helped me when I inherited the previous project (50+ developers, 70+ GB database, 1 GB of source code and not even a single line of comments in code (maybe few :), and everything written in foreign language )
Use the debugger to walk through the application. That will let you go both deep and wide. You'll also be able to learn about how the code handles specific scenarios.
When you're ready to change something as #Jaxidian said, Working Effectively with Legacy Code is a great resource.
I was recently in a similar situation. What helped in my case was focusing on the changes I needed to perform on the project, and in the process of making those changes I learned about how the project is structured and so on. Sure, the first few tasks took a bit longer, but look on the bright side: I got stuff done and I got familiar with the project at the same time.
I'd suggest two things that may help:
Be productivity-driven. In other words, find a change that needs doing and use this to learn how that bit of the system works. Your changes may not be the most elegant without a whole-picture understanding of the software, but you will get work done within days/weeks.
Follow things from the user-interface. I.e if a change involves things a user does on a dialog, find that dialog in the code (relatively easy) and then work backwards to see what bits of the code provide data to the dialog, how the dialog interacts with the system, etc. Trying to find "where does X happen in the code" is very hard without good documentation, but finding "where is the code relating to this dialog" is quite easy and gives you an entry-point into the code.
Whenever I start a new project, I spend 2-3 days skim reading the code and making notes. I basically go through the entire solution from top to bottom and make a map in a text editor of each (significant) class in each project and what it appears to do.
The aim in doing this is not to completely understand the entire codebase, so don't worry if you feel you are not getting your head around it completely. The aim is that you end up with an index of where to go when you need to start on your first piece of work. You should also end up with a cursory picture of the solution in the back of your brain that will get filled in over the next couple of months. I always do this on the first few days as your superiors will not expect you to be productive during this time and you may never get another opportunity where you have the time to do so.
Also, do not rely on code comments for direction. Even with the best intentions they are often unmaintained and may lead to incorrect conclusions about what a class or section of code may do: a comment may lie but the code always tells the truth.
If you already have a team, you could charge each with a part of framework, and the result of their exploration should be registered somewhere, like a wiki. After that, give to each a task similar to something which is already done in the system (from the functional point of view)
For example: if a list of products is displayed in your app, you could display a list of orders (the complexity should be approximately the same), in the same manner it's done actually in the app. Than make it more interesting: try to edit it and save into DB.
Than switch the tasks and let the questions appear and than the first person who made the same task will show & explain how things are done.
Like that you'll see how the things are done pretty easy + your team will be up to date with this knowledge.
Presuming there is a database, start with the data model. Somewhere (Mythical Man-Month?) it was written "if I have your tables, I don't need to see your code."
Regarding potential tools, you may want to look into NDepend. It is a code-analysis tool, with an emphasis on highlighting the internal organization and dependencies of the code base (see this post for typical outputs), and spotting code quality issues. I have not used it personally, but Patrick Smacchia, one of the developers of the product, has a few posts where he applies NDepend to some classic apps (here is NUnit for instance) and discusses what it means, and I found them interesting.
Go and speak to the users or, read the manual and / or if one exists, go on a training course for the system (internal training departments will sometimes have put them together if there are lots of users).
If you don't know what it's meant to be doing then the chances of you being able to work out how it does it are close to zero.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Straightforward C#/Java code is extremely difficult to parallelize, multi-thread, etc. As a result, straightforward C#/Java code will use less and less of the total processing power on a box (because everything is now going to be multi-core).
Solving this problem in C# and Java is not simple. Mutability and side effects are key to getting stuff done in C# and Java, but this is exactly what makes multi-core, multi-threading programming so difficult.
Hence, functional programming is going to become increasingly important.
Given that the J2EE/Ruby world will splinter amongst many functional/multi-core approaches (just like it does for just about everything else) while the .NET folks will all use F#, this line of thinking suggests that F# will be huge in two years.
What is wrong with this line of thinking? Why isn't it obvious that F# is going to be huge?
(Edit) Larry O'Brien nails it in this blog post: "Language-wise, in my opinion, this is a set of exercises where C and C++ shine — at least until the multithreading stuff. Languages with list-processing idioms will also do well initially, but may have memory-consumption issues (especially functional languages). Ultimately, I think that the managed C-derived language (Java and C#) have the easiest route to Exercise 9 and then face serious shortcomings with Exercise 10, where concurrency issues play the major role. In my opinion, concurrency is going to become the central issue in professional development in the next half-decade, so these shortcomings are very significant."
Straightforward C#/Java code is
extremely difficult to parallelize
Not if you use the Task Parallel Library.
Whether F# becomes huge depends on whether the cost/benefit is there, which is not at all obvious. If .NET developers find out that they can write some programs in 1/3 of the time using a functional rather than an imperative approach (which I think might be true for certain types of programs), then there should be some motivation for F# adoption.
Paul Graham's story of his use of Lisp in a startup company is illustrative of this process. Lisp provided them with a huge competitive advantage, yet Lisp didn't take over the world, not because it wasn't powerful, but for other reasons, like lack of library support. That F# has access to the .NET framework gives it a fighting chance.
http://www.paulgraham.com/avg.html
Functional programming is harder to get your head around than imperative programming. F# is a more difficult language in many ways than C#. Most 'developers' don't understand functional programming concepts, and can't even write very good imperative code in C#. So what hope have they got of writing good functional code in F#?
And when you consider that everybody on the team needs to be able to understand, write, debug, fix, etc. the code in the language you choose, it means you need a very strong team -- not just a very strong person -- to be able to use F# as it's meant to be used. And there aren't many of those around.
Add into the mix the fact that there's 8 years of C#/VB code lying around which is unlikely to be migrated, and that it's easier to create libraries that look and feel like the BCL in C#/VB as it's less easy to leak stuff like tuples etc. through public interfaces, and I reckon that F# will struggle to gain anything more than usage by strong teams on new, internal projects.
Ask a programming question on SO and specify you are using F#.
Ask the same question and specify you are using C#.
Compare the answers.
Using a novel programming language is a calculated risk--you may get more built-in functionality and syntactic sugar, but you will lose in community support, ability to hire programmers, and working around blind spots in the language.
I'm not picking on F#--every decision of programming language is a risk equation you need to work out. If people didn't take that risk on C#, we'd all still be using VB6 and C++ now. Same with those languages versus their predecessors. You have to decide for your project whether the advantages outweigh the risks.
There isn't really any case against F#, but you have to understand the context of the situation we, as developers, are in currently.
The multi-core architecture is still in it's infancy. The major driving force to change single-threaded apps over to a parrellel architecture is going to take time.
F# is very useful for a number of reasons, parrallelism being one of them, but not the only one. Functional programming is also extremely useful for scientific purposes. This will be huge in many sectors.
However, the way you're wording your question it sounds like you're stipulating that F# is already fighting a losing battle, which is definitely not the case. I've talked to many scientists to date that are using things such as MatLab and the like, and a lot of them are already aware of F#, and excited about it.
Imperative code is easier to write than functional code. (At least, its easier to find people who can right acceptable imperative code vs. functional code)
Some things are inherently single threaded (UI* is the best known example).
There's alot of C#/C/C++ code out there already, and multiple languages in the same project makes management of said project more difficult.
Personally, I think functional languages will become increasingly mainstream (heck F# itself is a testament to that) but probably never gain lingua franca status like C/C++/Java/C#/etc. have or will.
*This is apparently a somewhat contentious view, so I'll expand upon it.
In a multi-threaded UI, each UI event is dispatched asynchronously and on a thread of its own (the actual management of threads is probably more sophisticated than just spinning up a new one, but that's not really germane to the discussion).
Imagine if this were the case, and you're rendering the window.
The window manager asks you to draw each element (expect a message, or a function invokation for each element).
Each element reads its state (implicitly reading the application state)
Each element draws itself.
In step 2, every element MUST lock the application state (or the subset of it that affects display). Otherwise, in the event the application state is updated, the end result of rendering the window could include elements that reflect two different application states.
This is a lock convoy. Each render thread will lock, render, and then release; therefore they'll execute serially.
Now, imagine you're dealing with user input. First, users are pretty slow so the benefits are going to be non-existent unless you're doing considerable work on the (one-of-many) UI thread; so I'm going to assume thats the case.
The Window Manager informs your application of user input (once again, message, function call, whatever).
Read what's needed from the application state. (Locks needed here)
Spend noticable time crunching some numbers.
Update the application state. (Locks needed here as well)
All you've accomplished is changing from explicitly starting a worker thread, to implicitly doing so; at the cost of potential heisenbugs & deadlocks if you're loose with locking your state.
The fundamental problem with UI api's is that you're dealing with a many-to-one (or one-to-many depending on how you look at it) relationship. Either many windows, many elements, or many "input types" all of which affect a single window/surface. Some sort of synchronization has to happen, and when it does multi-threading doesn't have any benefits anymore just detractions.
What is wrong with this line of thinking? Why isn't it obvious that F# is going to be huge?
You're assuming the large masses actually write programs that need multicore support - or the programs would gain significant benefit from being parallellized. That's a false assumption.
Server side there's even less need for a parallell language.
Backend server processing already takes enough advantage of multicore/processor support by it's inherent nature of being concurrent(work is divided on clients via threads and among processes(e.g. one app server, one db server, one web container.. ).
What is wrong with this line of reasoning is that it assumes that everything will work out as planned.
There is the assumption that it will be easier to write multithreaded programs in F# than in C#. Historically, functional languages have not done all that well in popularity, and there's probably reasons why. Therefore, while it is generally easier to multithread functional than imperative languages, it's generally been easier to find people to program in imperative languages. These two things balance out somehow, depending probably on the people and the app. It may or may not be easier in general to write multithreaded applications in functional or imperative languages. It's far too early to tell.
There's the assumption that people are going to demand efficient use of their 1K-core computers. There are always applications that can take as much CPU power as they can find, but these aren't the most common applications. Most applications people run are not in any way limited by CPU power nowadays, but by delays in local I/O, networking, and users. This may change, but it won't change at all quickly.
Also, it isn't clear that massively multicore processors are the wave of the future. There may be a fairly small market for them, so chip manufacturers will produce smaller chips instead of more powerful, or will devote resources to other things that we aren't clear about right now.
There's the assumption that F# is going to be the winner among functional languages. As the VS 2010 functional language, it does have a considerable advantage. However, the race hasn't really started yet, and there's plenty of time for things to happen. It may turn out that F#.NET isn't a particularly good language to program massively parallel PCs, and something else may come about. It may happen that Microsoft and .NET won't be all that important by the time 64-core processors routinely come on cheap laptops. (Shifts like that aren't all that common, but they tend to come by surprise. They also are more likely to happen during times of conceptual change, and a mass move to functional languages would qualify.)
On the assumption that F# will continue to be the primary Microsoft functional language, that Microsoft programming languages will continue to be dominant, that getting maximum performance out of massively multicore processors will be important, that all the technical arguments won't be swamped by business inertia, and that F# will be considerably better than C# and other such languages at writing massively multithreaded applications, and that you're right. However, that's a whole lot of assumptions strung together and linked by plausible reasons rather than rigid logic.
You seem to be trying to predict the future as a combination of next year's stuff extended by one line of reasoning about technical issues, and that's extremely unreliable.
The only 'case' against it (if there is such a thing) is that most modern, professional developers use different tools (as well as different tool types). F# brings some unique tools to the game, and those of us who embrace them will find our respective, specialized talents useful for other programming tasks -- especially those tasks involving analysis and manipulation of large data collections.
What I've seen of F# truly amazes me. Every demo leaves me grinning because F# strikes me as an advanced edition of what I remember from 'the good old days' when functional programming was much more common (probably more 'old' than 'good' to be sure, but such is nostalgia).
I disagree with the premise that C# is hard to parallelize. It really isn't if you know what you're doing. Additionally, parallel linq will make this even easier. Do I wish there was an OpenMP for C#? Of course, but the tools C# provides allow you to do almost everything you want if you are good enough (and I feel one doesn't even have to be that good anymore).
There is a few things worth noting about technology
The best technical solution is not always the most popular or most used. (And I don't know if F# is any good) I would argue that SQL is the most used, most asked for programming language by employers and its not a nice,cool,fast,friendly,fun language in my book. If the best technical solution always "won", how do you explain qwerty keyboards? And if you ever read the "design" for x86/x64 processors.. ;)
Azul with 864 core servers exclusively uses Java, and the trend is bigger servers in future.
If we assume the battle is between C# and F#, I do not think F# will win over C# within 2 years for the following reasons:
The features of F# that C# does not have are not features people have been missing. For instance, I think Seq.map, Seq.iter, Seq.fold and friends are great, but I don't see a majority of developers switching from foreach to these constructs.
The performance benefits of multicores are irrelevant to most of the existing programs, as only few programs are cpu-bound. For those programs where performance really is important, (e.g. video games), C++ will remain predominant, at least for the 2 years to come. It's not that hard to use threads in C++, assuming one avoids side-effects (which you can decide to do even in C++). Isn't that what Google is doing?
For F# to become really big, I think it has to become one of the main languages used in teaching, the way Java has been. This is actually quite likely, seeing how the academic world is fond of functional languages. Should that happen, I don't think the effects will become visible before 5 years.
Linking assemblies together is not trivial.
F# is tied to the .NET typing system, which is significantly more restricted than, say, PHP. It's probably right up there with Java in the land of Strong Typing. That makes the entry barrier pretty high for someone who isn't intimately familiar with the .NET types.
Single-assignment code is hard to write; most algorithms use the typical Turing machine model, which permits multiple assignments and single-assignment code does not really neatly fit into a good model for How We Think. At least, for those of us who write Turing Machine code for a living. Perhaps it's different for those of us who write Lambda Machine code out there...
F# is tied to Microsoft, which produces knee-jerk hate from many geeks. They would rather use Lisp or Scheme or Haskell(or whatever). Although mono supports it, it doesn't support it well last time I tried to work on mono(it was quite slow).
Most of our existing code lives in imperative, sequential code bases, and most of our applications are oriented around imperative, sequential operations with side-effects.
Which is all to say, pure functional approaches do not neatly model the real world, so F# is going to have to carve out a niche where it easily manages real-world problems. It cannot be a general purpose language, because it does not neatly solve general purpose problems.
Learning from my last question, most member names seem to get included in the Project Output.
Looking at some decompilers like 9rays, Salamander, Jungle, many obfuscating techniques seem to have been already defeated, there's this one particularly scary claim:
Automatically removes string encryptions injected by obfuscators ~ Salamander
So is manual, source-code level obfuscating more effective than post-compile / mid-compile lathered, 'superficial' obfuscation by well known (easily defeated??) obfuscating programs?
Obfuscating source-code is going to be self-defeating in terms of maintenance.
If your project is so 'secret', I guess you have two choices:
Place the 'secret' proprietry code behind a service on a server that you control
Code it in a language so not easy to decompile such as C/C++
Maybe, debatably, but you'll destroy maintainability to do so.
Is this really worth it?
Actually this just comes down to security through obscurity, i.e. it's not security at all it's just an inconvenience. you should work fromt he assumption that any party interested enough will decompile your code if they can access it. It's not worth the pain you'll inflict on yourself to make it very slightly more time consuming for the evil haxxors. Deal with the real security problems of access.
As people stated obfuscation is about raising the bar. If you obfuscate your assembly you will stop a casual developer whose just curious but you won't stop a slightly motivated person from reverse engineering.
If you want to raise the bar a little further many obfuscation tools let you use non-printable characters as member names. use reflector on itself to have a look. This will stop a lot more people, I might look at obfuscated code to understand it, but if I can't read it, I'm not going to go through the pain of dumping it to IL, and renaming all the members manually, no motiviation for me to waste that much time.
However for some people there is a motiviation so you need to go another step if your business requirements nessecitate it. But no matter what you do if the computer can read it, there will be someone out there who can read it too. The goal is to reduce the number of people who can read it or would be motivated to read it.
There are also some tricks which you can use to make reflector break (Obfuscator from PreEmptive breaks reflector in some cases but of course you can still read the IL). I had an interesting conversation once with a developer of an obfusction tool and I won't be able to do it justice but he had a way to make reflector completly break by having the code jump dynamically around. For example one moment in your function a then you'd jump to the middle of function b. Doign this cause PEVerify to raise errors so they never actually implemented it but kind of neat idea.
annakata is correct. Really all you can do is make it more difficult (and costly) for the person to reverse engineer the software.
My company identified several areas in which we wanted to make it as difficult as possible for reverse engineering. For example our files are a binary format which each object in our hierarchy responsible for saving itself and reading back the correct version. This means for a person to read our files they would have replicate our entire hierarchy in the code they create to read our files. In addition much of the information in the Job file is useful without the corresponding bit in the shop standards files. So they have to do the work twice in order to understand what the job file is saying.
Several critical areas (dongle protection, communication with our metal cutting machines) reside in Win32DLL. Which means that they would have to know assembly and how to make DLL that replicate other DLLs signatures in order to reverse engineer our software. Plus our design with our CAM software is that it is highly interactive with the cutting machine (information being exchanged all the time)
From the few time we heard about competitors trying to deal with our machines alone they wound up replacing the electronics with their own in order to finish the job. Major bucks to do this.
Part of the steps we took was based on our own experience with trying to deal with competition's machine and software. We took that experience and learned how to tweak our setup. Of course we have limits in that we are not going sacrifice reliability or maintenance just for the purpose of defeating reverse engineering.
For your case, you will have to ask yourself what part of your software would be of interest to your competitors and proceed from there. If you are a vertical market developer (machine control, specialized accounting, etc) I suggest using a USB dongle for software control.
Otherwise use a serial number system and accept that people are going to pirate your software and build that into your business model. The purpose of a serial number scheme is that is relatively unintrusive, and hinders causal copying plus give you a remote chance of tracking down where the copy came from.
The problem there is you will be sacrificing readability to do it. If your project is that sacred to protect, I believe it is safe to assume two things:
The project is large enough that the hit in readability will come back to bite you in the ass.
The people who want to reverse-engineer it will do so anyway. It will just take a slightly larger feat of intelligence to determine what things do (instead of just reading the member names).
I am alarmed that you're even considering code level obfuscation. Won't you be obfuscating the code for yourself too? How do you intend to ever work on it again? For the sake of maintainability this shouldn't be done.
But consider this: -
Suppose there was a script/app that you can run that would open your project and cleverly obfuscate every string/variable name in your project and you compiled it afterward while your original code is securely untouched in a separate location.
Now that's some idea.
Actually code level obfuscation is less secure than what the obfuscators out there can do. This is primarily because obfuscators can take advantage of strict CLI implementation details that are not permitted by language compilers. For instance, it is entirely legal for private fields to all have the same name - but there isn't a compiler out there that will let you do that.
you can use a technique like this : http://g.palem.in/SecureAssembly.html using this you write in .net but you embed into a c++ executable your .net executable ,
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I wonder why would a C++, C#, Java developer want to learn a dynamic language?
Assuming the company won't switch its main development language from C++/C#/Java to a dynamic one what use is there for a dynamic language?
What helper tasks can be done by the dynamic languages faster or better after only a few days of learning than with the static language that you have been using for several years?
Update
After seeing the first few responses it is clear that there are two issues.
My main interest would be something that is justifiable to the employer as an expense.
That is, I am looking for justifications for the employer to finance the learning of a dynamic language. Aside from the obvious that the employee will have broader view, the
employers are usually looking for some "real" benefit.
A lot of times some quick task comes up that isn't part of the main software you are developing. Sometimes the task is one off ie compare this file to the database and let me know the differences. It is a lot easier to do text parsing in Perl/Ruby/Python than it is in Java or C# (partially because it is a lot easier to use regular expressions). It will probably take a lot less time to parse the text file using Perl/Ruby/Python (or maybe even vbscript cringe and then load it into the database than it would to create a Java/C# program to do it or to do it by hand.
Also, due to the ease at which most of the dynamic languages parse text, they are great for code generation. Sure your final project must be in C#/Java/Transact SQL but instead of cutting and pasting 100 times, finding errors, and cutting and pasting another 100 times it is often (but not always) easier just to use a code generator.
A recent example at work is we needed to get data from one accounting system into our accounting system. The system has an import format, but the old system had a completely different format (fixed width although some things had to be matched). The task is not to create a program to migrate the data over and over again. It is to shove the data into our system and then maintain it there going forward. So even though we are a C# and SQL Server shop, I used Python to convert the data into the format that could be imported by our application. Ultimately it doesn't matter that I used python, it matters that the data is in the system. My boss was pretty impressed.
Where I often see the dynamic languages used for is testing. It is much easier to create a Python/Perl/Ruby program to link to a web service and throw some data against it than it is to create the equivalent Java program. You can also use python to hit against command line programs, generate a ton of garbage (but still valid) test data, etc.. quite easily.
The other thing that dynamic languages are big on is code generation. Creating the C#/C++/Java code. Some examples follow:
The first code generation task I often see is people using dynamic languages to maintain constants in the system. Instead of hand coding a bunch of enums, a dynamic language can be used to fairly easily parse a text file and create the Java/C# code with the enums.
SQL is a whole other ball game but often you get better performance by cut and pasting 100 times instead of trying to do a function (due to caching of execution plans or putting complicated logic in a function causing you to go row by row instead of in a set). In fact it is quite useful to use the table definition to create certain stored procedures automatically.
It is always better to get buy in for a code generator. But even if you don't, is it more fun to spend time cutting/pasting or is it more fun to create a Perl/Python/Ruby script once and then have that generate the code? If it takes you hours to hand code something but less time to create a code generator, then even if you use it once you have saved time and hence money. If it takes you longer to create a code generator than it takes to hand code once but you know you will have to update the code more than once, it may still make sense. If it takes you 2 hours to hand code, 4 hours to do the generator but you know you'll have to hand code equivalent work another 5 or 6 times than it is obviously better to create the generator.
Also some things are easier with dynamic languages than Java/C#/C/C++. In particular regular expressions come to mind. If you start using regular expressions in Perl and realize their value, you may suddenly start making use of the Java regular expression library if you haven't before. If you have then there may be something else.
I will leave you with one last example of a task that would have been great for a dynamic language. My work mate had to take a directory full of files and burn them to various cd's for various customers. There were a few customers but a lot of files and you had to look in them to see what they were. He did this task by hand....A Java/C# program would have saved time, but for one time and with all the development overhead it isn't worth it. However slapping something together in Perl/Python/Ruby probably would have been worth it. He spent several hours doing it. It would have taken less than one to create the Python script to inspect each file, match which customer it goes to, and then move the file to the appropriate place.....Again, not part of the standard job. But the task came up as a one off. Is it better to do it yourself, spend the larger amount of time to make Java/C# do the task, or spend a much smaller amount of time doing it in Python/Perl/Ruby. If you are using C or C++ the point is even more dramatic due to the extra concerns of programming in C or C++ (pointers, no array bounds checking, etc.).
Let me turn your question on its head by asking what use it is to an American English speaker to learn another language?
The languages we speak (and those we program in) inform the way we think. This can happen on a fundamental level, such as c++ versus javascript versus lisp, or on an implementation level, in which a ruby construct provides a eureka moment for a solution in your "real job."
Speaking of your real job, if the market goes south and your employer decides to "right size" you, how do you think you'll stack up against a guy who is flexible because he's written software in tens of languages, instead of your limited exposure? All things being equal, I think the answer is clear.
Finally, you program for a living because you love programming... right?
I don't think anyone has mentioned this yet. Learning a new language can be fun! Surely that's a good enough reason to try something new.
I primarily program in Java and C# but use dynamic languages (ruby/perl) to support smoother deployment, kicking off OS tasks, automated reporting, some log parsing, etc.
After a short time learning and experimenting with ruby or perl you should be able to write some regex manipulating scripts that can alter data formats or grab information from logs. An example of a small ruby/perl script that could be written quickly would be a script to parse a very large log file and report out only a few events of interest in either a human readable format or a csv format.
Also, having experience with a variety of different programming languages should help you think of new ways to tackle problems in more structured languages like Java, C++, and C#.
One big reason to learn Perl or Ruby is to help you automate any complicated tasks that you have to do over and over.
Or if you have to analyse contents of log files and you need more mungeing than available using grep, sed, etc.
Also using other languages, e.g. Ruby, that don't have much "setup cost" will let you quickly prototype ideas before implementing them in C++, Java, etc.
HTH
cheers,
Rob
Do you expect to work for this company forever? If you're ever out on the job market, pehaps some prospective employers will be aware of the Python paradox.
A good hockey player plays where the puck is. A great hockey player plays where the puck is going to be.
- Wayne Gretzky
Our industry is always changing. No language can be mainstream forever. To me Java, C++, .Net is where the puck is right now. And python, ruby, perl is where the puck is going to be. Decide for yourself if you wanna be good or great!
Paul Graham posted an article several years ago about why Python programmers made better Java programmers. (http://www.paulgraham.com/pypar.html)
Basically, regardless of whether the new language is relevant to the company's current methodology, learning a new language means learning new ideas. Someone who is willing to learn a language that isn't considered "business class" means that he is interested in programming, beyond just earning a paycheck.
To quote Paul's site:
And people don't learn Python because
it will get them a job; they learn it
because they genuinely like to program
and aren't satisfied with the
languages they already know.
Which makes them exactly the kind of
programmers companies should want to
hire. Hence what, for lack of a better
name, I'll call the Python paradox: if
a company chooses to write its
software in a comparatively esoteric
language, they'll be able to hire
better programmers, because they'll
attract only those who cared enough to
learn it. And for programmers the
paradox is even more pronounced: the
language to learn, if you want to get
a good job, is a language that people
don't learn merely to get a job.
If an employer was willing to pay for the cost of learning a new language, chances are the people who volunteered to learn (assuming it wasn't a mandatory class) would be the same people to are already on the "fast track".
When I first learned Python, I worked for a Java shop. Occasionally I'd have to do serious text-processing tasks which were much easier to do with quick Python scripts than Java programs. For example, if I had to parse a complex CSV file and figure out which of its rows corresponded to rows in our Oracle database, this was much easier to do with Python than Java.
More than that, I found that learning Python made me a much better Java programmer; having learned many of the same concepts in another language I feel that I understand those concepts much better. And as for what makes Python easier than Java, you might check out this question: Java -> Python?
Edit: I wrote this before reading the update to the original question. See my other answer for a better answer to the updated question. I will leave this as is as a warning against being the fastest gun in the west =)
Over a decade ago, when I was learning the ways of the Computer, the Old Wise Men With Beards explained how C and C++ are the tools of the industry. No one used Pascal and only the foolhardy would risk their companies with assembler.
And of course, no one would even mention the awful slow ugly thing called Java. It will not be a tool for serious business.
So. Um. Replace the languages in the above story and perhaps you can predict the future. Perhaps you can't. Point is, Java will not be the Last Programming Language ever and also you will most likely switch employers as well. The future is charging at you 24 hours per day. Be prepared.
Learning new languages is good for you. Also, in some cases it can give you bragging rights for a long time. My first university course was in Scheme. So when people talk to me about the new language du jour, my response is something like "First-class functions? That's so last century."
And of course, you get more stuff done with a high-level language.
Learning a new language is a long-term process. In a couple of days you'll learn the basics, yes. But! As you probably know, the real practical applicability of any language is tied to the standard library and other available components. Learning how to use the efficiently requires a lot of hands-on experience.
Perhaps the only immediate short-term benefit is that developers learn to distinguish the nails that need a Python/Perl/Ruby -hammer. And, if they are any good, they can then study some more (online, perhaps!) and become real experts.
The long-term benefits are easier to imagine:
The employee becomes a better developer. Better developer => better quality. We are living in a knowledge economy these days. It's wiser to invest in those brains that already work for you.
It is easier to adapt when the next big language emerges. It is very likely that the NBL will have many of the features present in today's scripting languages: first-class functions, closures, streams/generators, etc.
New market possibilities and ability to respond more quickly. Even if you are not writing Python, other people are. Your clients? Another vendor in the project? Perhaps a critical component was written in some other language? It will cost money and time, if you do not have people who can understand the code and interface with it.
Recruitment. If your company has a reputation of teaching new and interesting stuff to people, it will be easier to recruit the top people. Everyone is doing Java/C#/C++. It is not a very effective way to differentiate yourself in the job market.
Towards answering the updated question, its a chicken/egg problem. The best way to justify an expense is to show how it reduces a cost somewhere else, so you may need to spend some extra/personal time to learn something first to build some kind of functional prototype.
Show your boss a demo like "hey, i did this thing, and it saves me this much time [or better yet, this much $$], imagine if everyone could use this how much money we would save"
and then after they agree, explain how it is some other technology and that it is worth the expense to get more training, and training for others on how to do it better.
I have often found that learning another language, especially a dynamically typed language, can teach you things about other languages and make you an overall better programmer. Learning ruby, for example, will teach you Object Oriented programming in ways Java wont, and vice versa. All in all, I believe that it is better to be a well rounded programmer than stuck in a single language. It makes you more valuable to the companies/clients you work for.
check out the answers to this thead:
https://stackoverflow.com/questions/76364/what-is-the-single-most-effective-thing-you-did-to-improve-your-programming-ski#84112
Learning new languages is about keeping an open mind and learning new ways of doing things.
Im not sure if this is what you are looking for, but we write our main application with Java at the small company I work for, but have used python to write smaller scripts quickly. Backup software, temporary scripts to manipulate data and push out results. It just seems easier sometimes to sit down with python and write a quick script than mess with classes and stuff in java.
Temp scripts that aren't going to stick around don't need a lot of design time wasted on them.
And I am lazy, but it is good to just learn as much as you can of course and see what features exist in other languages. Knowing more never hurts you in future career changes :)
It's all about broadening your horizons as a developer. If you limit yourself to only strong-typed languages, you may not end up the best programmer you could.
As for tasks, Python/Lua/Ruby/Perl are great for small simple tasks, like finding some files and renaming them. They also work great when paired with a framework (e.g. Rails, Django, Lua for Windows) for developing simple apps quickly. Hell, 37Signals is based on creating simple yet very useful apps in Ruby on Rails.
They're useful for the "Quick Hack" that is for plugging a gap in your main language for a quick (and potentially dirty) fix faster than it would take to develop the same in your main language. An example: a simple script in perl to go through a large text file and replace all instances of an email address with another is trivial with an amount of time taken in the 10 minute range. Hacking a console app together to do the same in your main language would take multiples of that.
You also have the benefit that exposing yourself to additional languages broadens your abilities and learning to attack problems from a different languages perspective can be as valuable as the language itself.
Finally, scripting languages are very useful in the realm of extension. Take LUA as an example. You can bolt a lua interpreter into your app with very little overhead and you now have a way to create rich scripting functionality that can be exposed to end users or altered and distributed quickly without requiring a rebuild of the entire app. This is used to great effect in many games most notably World of Warcraft.
Personally I work on a Java app, but I couldn't get by without perl for some supporting scripts.
I've got scripts to quickly flip what db I'm pointing at, scripts to run build scripts, scripts to scrape data & compare stuff.
Sure I could do all that with java, or maybe shell scripts (I've got some of those too), but who wants to compile a class (making sure the classpath is set right etc) when you just need something quick and dirty. Knowing a scripting language can remove 90% of those boring/repetitive manual tasks.
Learning something with a flexible OOP system, like Lisp or Perl (see Moose), will allow you to better expand and understand your thoughts on software engineering. Ideally, every language has some unique facet (whether it be CLOS or some other technique) that enhances, extends and grows your abilities as a programmer.
If all you have is a hammer, every problem begins to look like a nail.
There are times when having a screwdriver or pair of pliers makes a complicated problem trivial.
Nobody asks contractors, carpenters, etc, "Why learn to use a screwdriver if i already have a hammer?". Really good contractors/carpenters have tons of tools and know how to use them well. All programmers should be doing the same thing, learning to use new tools and use them well.
But before we use any power tools, lets
take a moment to talk about shop safety. Be sure
to read, understand, and follow all the
safety rules that come with your power
tools. Doing so will greatly reduce
the risk of personal injury. And remember
this: there is no more important rule
than to wear these: safety glasses
-- Norm
I think the main benefits of dynamic languages can be boiled down to
Rapid development
Glue
The short design-code-test cycle time makes dynamic languages ideal for prototyping, tools, and quick & dirty one-off scripts. IMHO, the latter two can make a huge impact on a programmer's productivity. It amazes me how many people trudge through things manually instead of whipping up a tool to do it for them. I think it's because they don't have something like Perl in their toolbox.
The ability to interface with just about anything (other programs or languages, databases, etc.) makes it easy to reuse existing work and automate tasks that would otherwise need to be done manually.
Don't tell your employer that you want to learn Ruby. Tell him you want to learn about the state-of-the-art in web framework technologies. it just happens that the hottest ones are Django and Ruby on Rails.
I have found the more that I play with Ruby, the better I understand C#.
1) As you switch between these languages that each of them has their own constructs and philosophies behind the problems that they try to solve. This will help you when finding the right tool for the job or the domain of a problem.
2) The role of the compiler (or interpreter for some languages) becomes more prominent. Why is Ruby's type system differ from the .Net/C# system? What problems do each of these solve? You'll find yourself understanding at a lower level the constructs of the compiler and its influence on the language
3) Switching between Ruby and C# really helped me to understand Design Patterns better. I really suggest implementing common design patterns in a language like C# and then in a language like Ruby. It often helped me see through some of the compiler ceremony to the philosophy of a particular pattern.
4) A different community. C#, Java, Ruby, Python, etc all have different communities that can help engage your abilities. It is a great way to take your craft to the next level.
5) Last, but not least, because new languages are fun :)
Given the increasing focus to running dynamic languages (da-vinci vm etc.) on the JVM and the increasing number of dynamic languages that do run on it (JRuby, Grrovy, Jython) I think the usecases are just increasing. Some of the scenarios I found really benifited are
Prototyping- use RoR or Grails to build quick prototypes with advantage of being able to runn it on the standard app server and (maybe) reuse existing services etc.
Testing- right unit tests much much faster in dynamic languages
Performance/automation test scripting- some of these tools are starting to allow the use standard dynamic language of choice to write the test scripts instead of proprietary script languages. Side benefit might be to the able to reuse some unit test code you've already written.
Philosophical issues aside, I know that I have gotten value from writing quick-and-dirty Ruby scripts to solve brute-force problems that Java was just too big for. Last year I had three separate directory structures that were all more-or-less the same, but with lots of differences among the files (the client hadn't heard of version control and I'll leave the rest to your imagination).
It would have taken a great deal of overhead to write an analyzer in Java, but in Ruby I had one working in about 40 minutes.
Often, dynamc languages (especially python and lua) are embedded in programs to add a more plugin-like functionality and because they are high-level languages that make it easy to add certain behavior, where a low/mid-level language is not needed.
Lua specificially lacks all the low-level system calls because it was designed for easeof-use to add functionality within the program, not as a general programming language.
You should also consider learning a functional programming language like Scala. It has many of the advantages of Ruby, including a concise syntax, and powerful features like closures. But it compiles to Java class files and and integrate seamlessly into a Java stack, which may make it much easier for your employer to swallow.
Scala isn't dynamically typed, but its "implicit conversion" feature gives many, perhaps even all of the benefits of dynamic typing, while retaining many of the advantages of static typing.
Dynamic languages are fantastic for prototyping ideas. Often for performance reasons they won't work for permanent solutions or products. But, with languages like Python, which allow you to embed standard C/C++/Java inside them or visa versa, you can speed up the really critical bits but leave it glued together with the flexibility of a dynamic language.
...and so you get the best of both worlds. If you need to justify this in terms of why more people should learn these languages, just point out much faster you can develop the same software and how much more robust the solution is (because debugging/fixing problems in dynamic languages is in my experience, considerably easier!).
Knowing grep and ruby made it possible to narrow down a problem, and verify the fix for, an issue involving tons of java exceptions on some production servers. Because I threw the solution together in ruby, it was done (designed, implemented, tested, run, bug-fixed, re-run, enhanced, results analyzed) in an afternoon instead of a couple of days. I could have solved the same problem using an all-java solution or a C# solution, but it most likely would have taken me longer.
Having dynamic language expertise also sometimes leads you to simpler solutions in less dynamic languages. In ruby, perl or python, you just intuitively reach for associative arrays (hashes, dictionaries, whatever word you want to use) for the smallest things, where you might be tempted to create a complex class hierarchy in a statically typed language when the problem doesn't necessarily demand it.
Plus you can plug in most scripting languages into most runtimes. So it doesn't have to be either/or.
The "real benefit" that an employer could see is a better programmer who can implement solutions faster; however, you will not be able to provide any hard numbers to justify the expense and an employer will most likely have you work on what makes money now as opposed to having you work on things that make the future better.
The only time you can get training on the employer's dime, is when they perceive a need for it and it's cheaper than hiring a new person who already has that skill-set.