Convert assembly to C using Replace some groups with Regex in C# - c#

I want to convert asm to c(assembly to C)
I saw http://www.textmaestro.com/InfoEx_17_Convert_Assembly.htm
(please the page)
page on web and easily after that i try to Do this job using find and Replace with Regex in C#
i am not computer field student so i am not professional at Regex.
I am working 5 days and after this time now i know that i cant do this.i wrote very code but without any success
sample program:
mov r1,1;
mov r2,2;
convert to :
r1=1;
r2=2;
please help me to do this correctly

OP has (painfully) learned that regexps are not a good solution to problems that involve analysis or translation of software. Processing strings simply is not the same as building context-sensitive analyses of text with complex structure.
People keep re-learning this lesson. It is true that you can use repeated regex to simulate Post rewriting systems, and Post systems, being Turing capable, can technically do anything. It is also true that nobody really wants to, or more importantly, nobody can write a very complex program for a real Turing machine [or an equivalent Post system]. This is why we have all these other computer languages and tools. [The TextMaestro system to which OP refers is trying to be exactly that Post system.]
However, the task he wants to do is possible and practical with the proper tools: program transformation systems (PTS).
In particular, he should see this technical paper for a description of precisely how this has been done with one particular PTS: See Pigs from sausages? Reengineering from assembler to C via FermaT transformations. Such a tool in effect is a custom compiler from assembly source code to the target language, and includes parsing, name (label) resolution, often data flow analysis and complex code generation and optimization. A PTS is used because they make it relatively easy to build that kind of compiler. This tool has been used for at least Intel assembly to C, and mainframe (System 360/370/Z) assembly to C, for large-scale tasks. (I have no relationship to this tool but do have huge respect for the authors).
The naysayers in the comments seem to think this is impossible to do except for extremely constrained circumstances. It is true that the more one knows about the assembly code in terms of idioms, the somewhat easier this gets, but the technical approach in the paper is not limited to specific compiler output by any means. It is also true that truly arcane assembler code (especially self-modifying or having runtime code generation) is extremely difficult to translate.

Related

Parsing C Header Files in C#

I'm working with Visual Studio C#, and I need to parse C header files to extract information only about the function declarations contained within. For each function I need the name, return type, and its parameters. If possible, I'd like the parameters in the order in which they appear in the function declaration.
I've seen stuff online about using visual studios tags, or Exhuberant Ctags, etc. But from what I gathered those aren't really options that let me perform the parse from my C# program with C# code (I may be mistaken?). I've also looked through all the other answers to related questions but they don't seem really apply to my situation (I may just be dumb).
If I could at least get all the lines of code that represent function declarations I'd have a good start and could hand-parse the rest myself.
Thanks in advance
To "parse" C (header) files in a deep sense and pick up the type information for function declarations, in practice you need:
a full preprocessor (including the pecaddillos added by the vendor, MS has some pretty odd stuff in their headers),
a full (syntax) parser/AST builder for the C dialect of interest (there's no such thing as "C"; there is what the vendor offers in this revision of the compiler)
a full symbol table construction (because typedefs are aliases for the actual types of interest)
Many people will suggest "write your own parser (for C)". Mostly those people haven't done this; its a lot more work to do this and get it right than they understand. If you don't start with a production-level machinery, you won't get through real C header files without fixing it all.
Just parsing plain C is hard; consider the problem of parsing the ambiguous phrase
T*X;
A classic parser cannot parse this without additional hackery.
You will also not be able to parse a C header file by itself, in general. You need to have the source code context (often including the compiler command line) in which it is included, or typedefs, preprocessor conditionals and macros in a specific header file will be undefined and therefore unexpandable into the valid C that the compiler normally sees.
You are better off getting pre-existing pre-tested machinery that will do this for you. Clang comes to mind as an option, although I'm not sure it handles the MS header files. GCC is kind of an option, but it really, really wants to be a compiler, not your local friendly C source code analysis tool, and again I'm unsure of its support for MS dialects of C. Our DMS Software Reengineering Toolkit has all of the above for various MS dialects of C.
Having chosen a tool that can actually parse such headers, you'll likely want to do something with the collected header information. You are vague about what you want to accomplish. Having mentioned C# and C in the same breath, there's a hint that you want to call C programs from C# code, and thus need to generate C# equivalent APIs for the C code. For this you will need machinery to manipulate the type information provided, and to build the "text" for the C# declarations. For this, you are likely to find that you need other supporting tooling to do that part, too. Here GCC is a complete non-starter; it will offer you no additional help. Clang and DMS are both designed to be libraries of custom-tool building machinery.
Of course, this may all be moot depending on how much header file text you want to handle; it if is just one header file, doing it manually is probably easiest. You suggest you are willing to do that ("could hand-parse..."). In that case, all you really need to do is to run the preprocessor and interpret the output. I beleive you can do with command line switches for GCC and Clang and even the MS compilers; I know DMS can do this. For easily avialable options here, see How do I see a C/C++ source file after preprocessing in Visual Studio?

Is the Roslyn model so C#/VB.NET centric that it precludes XAML analysis now and in the future?

I have just read the blog entry by JetBrains (Resharper) that suggests that Roslyn could never do XAML analysis:
Another core difference is that Roslyn covers exactly two languages, C# and VB.NET, whereas ReSharper architecture is multilingual
(quote from resharper blog)
For the uninitiated Resharper can do very good static analysis on XAML code allowing code completion and refactoring together with C#.
I am curious. Is the Roslyn architecture general enough to be extended to other languages than C# and VB.Net such as XAML or is it very specific.
To avoid suggesting this is opinion based I am looking for evidence in the source. Obviously any code can be refactored /re-engineered over time to fit some other purpose but I'm only interested in current evidence in the source or references to quotes from Roslyn develepors indicating that there is intent to extend Roslyn as an analysis engine to other languages such as XAML.
Hard to teach "languages" in a paragraph or two. People tend top call many things "language" and in some very general sense they might be but in real-world of programming "language" means programming language. In that world, XAML is not language at all :-) because you can't write code in it.
XAML is a data description format and yes in a very general sense it can be called a language (has some basic constructs, their combination rules, and the result denotes something meaningful). But it doesn't have even its own syntax - it's XML. So it belongs to a group with say MSBuild and HTML. The kind of analysis you could, in theory, do with that is entirely different and very much related to the actual domain that the format describes.
For MSBuild files, you could write some code to analyze dependencies it denotes and says try to find and point out holes in that but all you need for that are a few standard .NET classes. You don't need a parser, don't have any code generation, etc. You just load XML like any other and you already have everything and cam cruise around and dig out relations. Same for XAML. For everything XML-based "Roslyn" is XmlDocument, or XPathDocument.
Roslyn breaks it basks to produce a tree structure that allows you the same kind of cruising freedom that you get just by loading the XAML document into XmlDocument.
I made a simple analysis of a freshly-cloned copy of the Roslyn source code (specifically, the Compilers directory), to see how much code is shared between then languages and how much is language-specific.
The results:
in total there are 3072 code files (.cs and .vb file extensions), totaling 78 MB
out of that, the CSharp directory contains 1060 code files, worth 30 MB
the VisualBasic directory contains 1165 code files, which amount to 41 MB
that leaves 847 files and 7 MB for common code
So, as you can see, the the two compilers share only a relatively small part of their code. If you want to create a compiler for a new language, basing it on Roslyn won't help you much.

C# Interpreter (without compilation) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there a ready-to-use C# interpreter out there, that is does not rely on runtime compilation?
My requirements are :
A scripting engine
Must Handle C# syntax
Must work on medium-trust environments
Must not use runtime compilation (CodeDomProvider ...)
Open source (or at least free of charge both for personal and professional use)
If this is not clear, I need something like Jint (http://jint.codeplex.com/), but which allows me to write C# scripts instead of JavaScript ones.
Thanks for your help.
Have you looked at paxScript.NET?
Check out the Mono project. They recently demoed CsharpRepl which sounds like what you're after. The PDC 2008 video here.
Update:
On a close look it seems like using Mono.CSharp service to evaluate scripts won't be possible. Currently it is linked to the Mono runtime and they don't expect it to run in a medium trust environment. See this discussion for more info.
On alternative possibility is to include the Mono C# compiler (sources here) in your project and use it to generate assemblies that you load from the file system. It you are worried about the resources required to load all those assemblies you might have to load them in a separate AppDomain.
I need to evaluate 10000+ small
scripts that are all differents,
compiling all of them would be just
dramatically slow
Interpretting these would be even more painfully slow. We have a similar issue that we address as follows:
We use the Gold Parser project to parse source code and convert it to an XML based 'generic language'. We run this through a transform that generates VB.Net source code (simply because it's case insensitive). We then compile these using the .Net runtime into a standalone DLL, and call this using heavily restricted access.
It sounds as though you are creating something like a dynamic website where people can create custom modules or snippets of functionality, but using C# to do this introduces a couple of main problems; C# has to be compiled, and the only way around this is to interpet it at runtime, and this is unfeasible, and even if you do compile each snippet then you end up with 10,000 DLLs, which is impractical and unusable.
If your snippets are rarely changing, then I would consider programatically wrapping them into a single set of source, with each having a unique name, then compile them in a single shot (or as a timed process every 10mins?). This is what we do, as it also allows 'versioning' of peoples sessions so they continue using the version of DLL they had at the start of their session, but when every session stops using an old version then it's removed.
If your snippets change regularly throughout the day then I would suggest you look at an interpretted scripting language instead, even PHP, and mix your languages depending on the functionality you require. Products such as CScript and LinqPad all use the CodeDomProvider, because you have to have IMSL somewhere if you want to program compiled logic.
The only other option is to write your own interpretter and use reflection to access all the other libraries you need to access, but this is extremely complex and horrible.
As your requirements are effectively unachievable, I would suggest you take a step back and figure out a way of removing one or more restrictions. Whether you find a FullTrust environment to compile your snippets in, remove the need for full code support (i.e. move to interpretted code snippet support), or even change the whole framework to something non .Net.
LINQPad can work as a code snippet IDE. The application is very small and lightweight. It is free (as in beer) but not open-source. Autocompletion costs extra but not much ($19).
Edit: after reading over the comments in this post a little more carefully, I don't think LINQPad is what you want. You need something that can programmatically evaluate thousands of little scripts dynamically, right? I did this at work using Iron Ruby very easily. If you're willing to use a DLR language, this would probably be more feasible. I also did some similar work with some code that could evaluate a C# lambda expression passed in as a string but that was extremely limited.
I have written an open source project, Dynamic Expresso, that can convert text expression written using a C# syntax into delegates (or expression tree). Expressions are parsed and transformed into Expression Trees without using compilation or reflection.
You can write something like:
var interpreter = new Interpreter();
var result = interpreter.Eval("8 / 2 + 2");
or
var interpreter = new Interpreter()
.SetVariable("service", new ServiceExample());
string expression = "x > 4 ? service.SomeMethod() : service.AnotherMethod()";
Lambda parsedExpression = interpreter.Parse(expression,
new Parameter("x", typeof(int)));
parsedExpression.Invoke(5);
My work is based on Scott Gu article http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx .
or http://www.csscript.net/
Oleg was writing a good intro at code project
It doesn't handle exact C# syntax, but PowerShell is so well enmeshed with the .NET framework and is such a mature product, I think you would be unwise to ignore it as at least a possible solution. Most server products being put out by Microsoft are now supporting PowerShell for their scripting interface including Microsoft Exchange and Microsoft SQL Server.
I believe Mono has mint, an interpreter they use before implementing the JIT for a given platform. While the docs in the official site (e.g. Runtime) say it's just an intermediate state before consolidating the jitting VM, I'm pretty sure it was there the last time I compiled it on Linux. I can't quite check it right now, unfortunately, but maybe it's in the direction you want.
bungee# is the thing that you want, in a short time, bungee sharp will be an open source project in
http://www.crssoft.com/Services/Bungee
. you can create scripts with the same c# syntaxt. there is no assembly creation when you run the script, interpretation is done on the fly, so the performance is high. all the keywords are available like c#. I hope u will like it very much..
I faced the same problem. In one project I was looking to provide a generic way to specify conditions controlling when a certain letter has to be generated. In another project the conditions were controlling how cases were assigned to queues. In both of them The following solution worked perfectly:
The Language for the snippets - I chose JScript so that I do not have to worry about variable types.
The Compilation - yes it requires full trust, but you can place your code in a separate assembly and give it full trust. Do not forget to mark it with AllowPartiallyTrustedCaller attribute.
Number of code snippets - I treated every snippet as a method, not a class. This way multiple methods can be combined into a single assembly
Disk usage - I did all compilation in memory without saving the assembly to disk. It also helps if you need to reload it.
All of this works in production without any problems
Edit
Just to clarify 'snippet' - The conditions I am talking about are just boolean expressions. I programatically add additional text to turn it to methods and methods to compilable classes.
Also I can do the same with C# although I still think JScript is better for code snippets
And BTW my code is open source feel free to browse. Just keep in mind there is a lot of code there unrelated to this discussion. Let me know if you need help to locate the pieces concerning the topic
This one works really well
c# repl and interactive interpreter
Is Snippet Compiler something you looking for?

C# .net Mnemonics and use in general

I'm just starting out with C# and to me it seems like Microsoft Called their new system .Net because you have to use the Internet to look everything up to find useful functions and which class they stashed it in.
To me it seems nonsensical to require procedure/functions written and designed to stand alone ( non instantiated static objects) to have their class not also function as their namespace.
That is Why can't I use Write or WriteLine instead of Console.WriteLine ?
Then when I start to get used to the idea that the objects I am using ( like string) know how to perform operations I am used to using external functions to achieve ( like to upper, tolower, substring, etc) they change the rules with numbers, numbers don't know how to convert themselves from one numeric type to another for some reason, instead you have to invoke Convert class static functions to change a double to an int and Math class static functions to achieve rounding and truncating.. which quickly turns your simple( in other languages) statement to a gazillion character line in C#.
It also seems obsessed with strong typing which interferes somewhat with the thought process when I code. I understand that type safety reduces errors , but I think it also increases complexity, sometimes unnecessarily. It would be nice if you could choose context driven types when you wish without the explicit Casting or Converting or ToStringing that seems to be basic necessity in C# to get anything done.
So... Is it even possible to write meaningful code in notepad and use cl with out Internet access? What ref book would you use without recourse to autocomplete and Network access?
Any suggestions on smoothing the process towards grokking this language and using it more naturally?
I think you're suffering a bit from the fact that you've used to working in one way during some years, and now must take time to get yourself comfortable using / developing in a new platform.
I do not agree with you , that MS hasn't been consistent on the fact that a string knows how it should convert itself to another type, and other datatypes (like ints) do not.
This is not true, since strings do not know for themselves how they should be converted to another type as well. (You can use the Convert class to Convert types to other types).
It is however true that every type in .NET has a ToString() method, but, you should not rely on that method to convert whatever you have to a string.
I think you have never worked in an OO language before, and therefore, you're having some difficulties with the paradigm shift.
Think of it this way: it's all about responsabilities and behaviour. A class is (if it is well designed) responsible for doing one thing, and does this one thing good.
There is no excuse to use notepad to code a modern language. SharpDevelop or Visual C# Express provide the functionality to work with C# in a productive way.
And no, due to the complexity, not using the internet as a source of information is also not a good option.
You could buy a book that introduces you to the concepts of the language in a structured way, but to get up-to-date information, the internet is neccessary.
Yes, there are drawbacks in C#, like in any other language. I can only give you the advice to get used to the language. Many of the drawbacks become understandable after that, even if some of them don't become less annoying. I recommend that you ask clear, direct questions with example code if you want to know how some language constructs work or how you can solve specific problems more efficiently. That makes it easier to answer those questions.
For notepad, I have no useful advice, however I would advise you to use one of the free IDE's, Microsofts Express Editions, or Sharp Develop.
The IDE will speed the groking of the language, at which point, you can switch back to notepad.
Reading your post I was thinking that you worked mostly with C or dynamic languages previously. Maybe C# is just a wrong choice for you, there are IronPython, F# and a bunch of other languages that have necessary functionality (like functions outside of classes etc.)
I disagree with you about consistency. In fact there are small inconsistency between some components of .NET, but most part of FW is very consistent and predictable.
Strong typing is a huge factor in low defect count. Dynamic typing plays nice in small/intermediate projects (like scripts, etc). In more or less complex program dynamism can introduce a lot of complexity.
Regarding internet/autocomplete - I can hardly imagine any technology with size of .NET that doesn't require a lot of knowledge sources.
Programming in c# using notepad is like buying a ferrari to drive in dirt roads.
At least use Visual Studio Express Edition. For what you wrote I understand that you come from a non OO background, try to learn the OO concept and try to use it. You will eventually understand most design decisions made for .Net.
http://en.wikipedia.org/wiki/Object-oriented_programming
Oh boy where do i start with you(this will be a long post hahaha), well, lets go little by little:
"Microsoft called their system .NET because you have to use Intenet...", the reason why is called .NET is because the SUITE OF MICROSOFT LANGUAGUES(and now some other ones too like Phyton and Ruby, etc) CAN CALL ANY LIBRARY or DLLs, example you can "NET"(Network OR CALL) a DLL that was built in Visual Basic, F#, C++ from WITHIN C# or from any of those languagues you can also call(or ".NET") C# libraries. OK ONE DOWN!!!
NEXT ONE: "it seems nonsensical to require....to have their class not also function as their namespace", this is because a Namespace can have AS MANY CLASSES AS YOU WISH, and your question:
"That is Why can't I use Write or WriteLine instead of Console.WriteLine ?".
The reason is because: "Console"(System.Console hense the "Using" statement at the beginning of your program) Namespace is where "Write" and "WriteLine" LIVES!!(you can also FULLY qualify it (or "call It"). (all this seems to me that you need to study C# Syntax), ok NEXT:
"when I start to get used to the idea that the objects...", ok in simple words:
C# is a "Strongly Type-Safe language" so that SHOULD-MUST tell you what "you are getting in to" otherwise STAY WITH "WEAK or NO TYPE SAFE LANGUAGES" LIKE PHP or C , etc. this does NOT means is bad it just MEANS IS YOUR JOB TO MAKE SURE, as i tell my students: "IF YOU NEED AN INT THEN DEFINE AN INT INSTEAD LETTING THE COMPILER DO IT FOR YOU OTHERWISE YOU WILL HAVE A LOT OF BAD BUGS", or in other words do YOUR homework BEFORE DESIGNING A PIECE OF SOFTWARE.
Note: C# is IMPLICITY TYPE SAFE language SO IF YOU WANT YOU CAN RUN IT AS UNSAFE so from then it wiLL be your job to make sure, so dont complain later(for being lazy) when bugs arrive AT RUNTIME(and a lot of times when the customer is already using your crappy software).
...and last but not least : Whey do you wan to shoot yourself by using notepad? Studio Express is FREE, even the database SQL SERVER is FREE TOO!!, unless you work for a company I WILL ASK FOR PRO, ETC. all the "extra" stuff is for large companies, teams, etc, YOU CAN DO 99% OF THE STUFF WITH THE FREE VERSIONS(and you can still buy-update to full version once you want to scalate to Distributed Software or a Large Project, or if your software becomes a big hit, Example: if you need millions of queryes or hits PER SECOND from your database or 100 people are working on same project(code) but for the majority of times for 2 or 3 "normal" developers working at home or small office the FREE ONES ARE ENOuGH!!)
cherrsss!!! (PS: Software Developer since the 80's)

Looking for a configurable pretty printer for C# code

I work on a team with about 10 developers. Some of the developers have very exacting formatting needs. I would like to find a pretty printer that I could configure to these specifications and then add to the build processes. In this way no matter how badly other people mess up the format when it is pulled down from source control it will look acceptable.
The easiest solution is for the team lead to mandate a format and everyone use it. The VS defaults are pretty good.
Jeff Atwood did that to us here on Stack Overflow and while I rebelled at first, I got over it :) Makes everything much easier!
Coding standards are definitely something we have. The coding formatting I am talking about is imposed by a grizzled architect that is, lets say, set in his ways and extremely particular. Lets just pretend that we can not address the human factor. I was looking for a way to circumvent the whole human processes.
The visual studio defaults sadly do not address line breaks very well. I am just making this line chopping style up but....
ServiceLocator.Logger.WriteDefault(string.format("{0}{1}"
,foo
,bar)
,Logging.SuperDuper);
another example of formatting visual studio is not too hot at....
if( foo
&& ( bar
|| baz
|| apples
|| oranges)
&& IsFoo()
&& IsBar() ){
}
Visual studio does not play well at all will stuff like this. We are currently using ReSharper to allow for more granularity with formating but it sadly falls sort in many areas.
Don't get me wrong though coding standards are great. The goal of the pretty printer as part of the build process is to get 'perfect' looking code no matter how well people are paying attention or counting their spaces.
The edge cases around code formatting are very solvable since it is a well defined grammar.
As far as the VS defaults go I can only say: BSD style or die!
So all that brings me full circle back to: Is there a configurable pretty printer for C#? As much as lexical analysis and parsing fascinate I have about had my fill making a YAML C# tool chain.
Your issue was the primary intent for creating NArrange (beta). It allows configurable reformatting of C# code and you can use one common configuration file to be shared by the entire team. Since its focus is primarily on reordering members in classes and controlling regions, it is still lacking many necessary formatting options (especially formatting within member code lines).
The normal usage scenario is for each developer to run the tool prior to check-in. I'm not aware of any one running it is part of their build process, but there is no reason why you couldn't, since it is a command-line tool. One idea that I've contemplated is running NArrange on files as part of a pre-commit step. If the original file contents being checked in don't match the NArrange formatted output on the source repository server, then the developer didn't reformat to the rules and a check-in error can be raised.
For more information, see my CodeProject article on Using NArrange to Organize C# Code.
Update 2023-02-25: NArrange appears to have moved to Github. The NArrange site (referenced above) is no longer available although there are copies in web.archive.org
I second Jarrod's answer. If you have 2 developers with conflicting coding preferences, then get the rest of the team to vote, and then get the boss to back the majority decision.
Additionally, the problem with trying to automatically apply a pretty printer like that, is that there will always be exceptional cases where your blanket coding standard is not the best or most readable solution, and you will lose out by squashing them with an automated tool.
Coding Standards are just that, standards. They don't call them Coding Laws or Coding Rules, and there's a good reason for that.

Categories