Is it possible to call C# lexical/syntactic analyzers without compilation? - c#

Considering this question of SO, where whole C# in-memory compiler is being called. When only lexical and syntactic analyzing is required: parse text as a stream of lexemes, check them and exit.
Is it possible in current version of System.CodeDom.Compiler, if not - will it be?

If you can use Mono, I believe it has a C# parser/lexer you may be able to use.
Here's a link to look into. As for what the MS C# team is planning to do, there is some talk of at some point making the C# compiler into a "service" - but it's unclear what that means or when that will happen.

While it might look like the code is compiled in-memory (CompilerParameters.GenerateInMemory), that's not what actually happens. The same compiler as the one used in Visual Studio is used to compile the code (csc.exe). It gets started by CreateProcess (much like Process.Start) and runs out-of-process to compile the code to an assembly on disk in a temporary folder. The GenerateInMemory option invokes Assembly.LoadFrom() to load the assembly.
You'll get the equivalent of a syntax check simply by setting GenerateInMemory to false and delete the OutputAssembly after it is done.
While this might sound kinda backwards, the huge benefit it has is that this won't put any memory pressure on your process. This will hold you over until C# 5.0 ships.

Related

Exclude other methods, variables and classes that are not needed for your method to work

You load a foreign code example with libraries attached to it in Visual Studio. Now there is a method that you want to reuse in your code. Is there a function in VS that lets you strip the code from all unnecessary code to only have code left that is necessary for your current method to run?
It is not about the library. Loading a .sln or .csproj and having classes over classes when you just want one method out of it is a waste of performance, ram and space. It is about code you can easily omit or references(what I call libraries) you can easily omit. A part-question of this is: Which "using" statement do you need that is only necessary for your current method and the methods that pass paramaters to it? In short, showing relevant code only. Code that is tied to each other.
Let's use an example: You go to github and download source code in c#. Let's call the solution S. You open S in Visual Studio. You don't disassemble, you just load the source code of S, that is there in plain text. Then you find a method M - in plain text - that you want to use. M contains some objects whose classes were defined somewhere in the project. The goal is to recreate the surrounding only for this method to copy & paste it into my own solution without having red underlined words in almost every line within the method
after reading the question and the comments, I think I have a vague idea what you are referring to.
In case we ignore the context of the method you are referring, you can extract any code piece from a "library" by using a .NET decompiler and assembly browser.
There are many of them for free, such as:
dotPeek,
ILSpy
...
This will allow you to see the method's code. From there on, you can proceed as you like. In case your copy the method to your code base, you might still have to change it a bit in order to adapt it to work with your objects and context. If you don't, this will give you insight on how the method works and might help you to understand the logic, so you can write your own.
Disclaimer: With this post, I am pointing out that it is possible to extract code from an assembly. I am not discussing the ethics or legal perspective behind such actions.
Hope this helps,
Happy Coding!
If it`s just one method, look at the source code and copy it to your libarary. Make sure you make a comment where you obtained the code and who has the copyright! Don't forget to include the licence, which you should have done with a libary reference anyway.
That said it is currently not (official) possible to automaticly remove unused public declared code from a library (assembly). This process is called Treeshaking by the way. Exception: .NET Native.
But .NET Native is only available for Windows Store Apps. You can read more about it here.
That said, we have the JIT (Just in Time)-Compiler which is realy smart. I wouldn't worry about a few KB library code. Spend your time optimizing your SQL Queries and other bottlenecks. The classes are only loaded, when you actualy use them.
Using some unstable solutions or maintaining a fork of a library, where you use more then one method (with no documentation and no expertise, since it is your own fork) isn't worth the headache, you will have!
If you realy want to go the route of removing everything you do not want, you can open the solution, declare everything as internal (search and replace is your friend) and restore the parts to public, which are giving you are Buildtime error / Runtime error (Reflection). Then remove everything which is internal. There are several DesignTime tools like Resharper, which can remove Dead Code.
But as I said, it's not worth it!
For .NET Core users, in 6-8 weeks, we have the .NET IL Linker as spender has commented, it looks promising. What does this mean? The .NET framework evolves from time to time. Let it envolve and look at your productivity in the meantime.

CSharpCodeProvider - Is it abusable?

Apologies for the shortness of the question, however I don't think it needs much elaboration.
Any there any security implications caused by using the CSharpCodeProvider and could it open a server up for attack?
It depends on how you use it. Here is a summary sorted from the safe use to a use that you certainly don't want to allow (when running the code on a server or some environment that you want to control):
If you use CSharpCodeProvider just for generating C# source code, then you only need a permission to save the generated files to some directory or to noting at all (if it is possible to get the code generated into a memory stream)
If you use it for compiling generated C# source, then you need a permission to run csc.exe (which may not be available in some limited environments such as shared hostings).
If you just generate files & compile them, then it probably won't be harmful (although someone could probably abuse your application to generate many, many files and attack the server using some kind of DOS attack.
If you also load & execute the generated code, then it depends on how you generate it. If you assume that there are no bugs in C#/CodeDOM and can guarantee that the generated code is safe, then you should be fine.
If your code contain things such as CodeSnippetExpression that can be provided by the user (in some way) than the user can write and run anything he or she wants on your server, so this would be potentially quite dangerous.
Sort of. On the surface it's not a direct risk, because you're not running code, just compiling it. However, there's nothing that says that the C# compiler doesn't contain some sort of bug that, given the right malicious input, would cause it to bail out and start executing commands directly.
However, if you later execute the compiled code (and presumably you do -- otherwise why would you compile it to begin with?), it will be running the same context as you are. Obviously, that has all kinds of unpleasant security implications, much like using the quasi-analogous eval() feature of other languages.
It depends on the source that you are compiling. If you have enough control over the source, then it might be an acceptable risk. If you are allowing someone outside of your sphere of trust supply code to the compiler, it might be an unacceptable risk.

Make an executable at runtime

Ok, so I was wondering how one would go about creating a program, that creates a second program(Like how most compression programs can create self extracting self excutables, but that's not what I need).
Say I have 2 programs. Each one containing a class. The one program I would use to modify and fill the class with data. The second file would be a program that also had the class, but empty, and it's only purpose is to access this data in a specific way. I don't know, I'm thinking if the specific class were serialized and then "injected" into the second file. But how would one be able to do that? I've found modifying files that were already compiled fascinating, though I've never been able to make changes that didn't cause errors.
That's just a thought. I don't know what the solution would be, that's just something that crossed my mind.
I'd prefer some information in say c or c++ that's cross-platform. The only other language I'd accept is c#.
also
I'm not looking for 3-rd party library's, or things such as Boost. If anything a shove in the right direction could be all I need.
++also
I don't want to be using a compiler.
Jalf actually read what I wrote
That's exactly what I would like to know how to do. I think that's fairly obvious by what I asked above. I said nothing about compiling the files, or scripting.
QUOTE "I've found modifying files that were already compiled fascinating"
Please read and understand the question first before posting.
thanks.
Building an executable from scratch is hard. First, you'd need to generate machine code for what the program would do, and then you need to encapsulate such code in an executable file. That's overkill unless you want to write a compiler for a language.
These utilities that generate a self-extracting executable don't really make the executable from scratch. They have the executable pre-generated, and the data file is just appended to the end of it. Since the Windows executable format allows you to put data at the end of the file, caring only for the "real executable" part (the exe header tells how big it is - the rest is ignored).
For instance, try to generate two self-extracting zip, and do a binary diff on them. You'll see their first X KBytes are exactly the same, what changes is the rest, which is not an executable at all, it's just data. When the file is executed, it looks what is found at the end of the file (the data) and unzips it.
Take a look at the wikipedia entry, go to the external links section to dig deeper:
http://en.wikipedia.org/wiki/Portable_Executable
I only mentioned Windows here but the same principles apply to Linux. But don't expect to have cross-platform results, you'll have to re-implement it to each platform. I couldn't imagine something that's more platform-dependent than the executable file. Even if you use C# you'll have to generate the native stub, which is different if you're running on Windows (under .net) or Linux (under Mono).
Invoke a compiler with data generated by your program (write temp files to disk if necessary) and or stored on disk?
Or is the question about the details of writing the local executable format?
Unfortunately with compiled languages such as C, C++, Java, or C#, you won't be able to just ``run'' new code at runtime, like you can do in interpreted languages like PHP, Perl, and ECMAscript. The code has to be compiled first, and for that you will need a compiler. There's no getting around this.
If you need to duplicate the save/restore functionality between two separate EXEs, then your best bet is to create a static library shared between the two programs, or a DLL shared between the two programs. That way, you write that code once and it's able to be used by as many programs as you want.
On the other hand, if you're really running into a scenario like this, my main question is, What are you trying to accomplish with this? Even in languages that support things like eval(), self modifying code is usually some of the nastiest and bug-riddled stuff you're going to find. It's worse even than a program written completely with GOTOs. There are uses for self modifying code like this, but 99% of the time it's the wrong approach to take.
Hope that helps :)
I had the same problem and I think that this solves all problems.
You can put there whatever code and if correct it will produce at runtime second executable.
--ADD--
So in short you have some code which you can hard-code and store in the code of your 1st exe file or let outside it. Then you run it and you compile the aforementioned code. If eveything is ok you will get a second executable runtime- compiled. All this without any external lib!!
Ok, so I was wondering how one would
go about creating a program, that
creates a second program
You can look at CodeDom. Here is a tutorial
Have you considered embedding a scripting language such as Lua or Python into your app? This will give you the ability to dynamically generate and execute code at runtime.
From wikipedia:
Dynamic programming language is a term used broadly in computer science to describe a class of high-level programming languages that execute at runtime many common behaviors that other languages might perform during compilation, if at all. These behaviors could include extension of the program, by adding new code, by extending objects and definitions, or by modifying the type system, all during program execution. These behaviors can be emulated in nearly any language of sufficient complexity, but dynamic languages provide direct tools to make use of them.
Depending on what you call a program, Self-modifying code may do the trick.
Basically, you write code somewhere in memory as if it were plain data, and you call it.
Usually it's a bad idea, but it's quite fun.

Do you use regular builds as a coding tool?

We have a large (about 580,000 loc) application which in Delphi 2006 builds (on my machine) in around 20 seconds. When you have build times in seconds, you tend to use the compiler as a tool. i.e. write a little code, build, write some more code and build some more etc etc As we move some of our stuff over to C#, does anyone have a comparison of how long something that size would take to build? I only have small apps and components at the moment, so can't really compare. If things are going to take a lot longer to build, then I may need to change my style! Or is my style just lazy?
For example, if I'm changing the interface of a method call, rather than do a full search on all the app to find out where I need to make changes to calls, I'll use the compiler to find them for me.
Visual Studio 2008 SP1 now has background compilation for C# (it's always had it for VB.NET). Back in my VB days, I often used this to find where something was referenced by changing the name and then seeing where the background compiler said there was an error.
I never worked on anything quite this large. At my last job we had about 60,000 loc spread over about 15 projects and it took about 10 seconds to compile. Maybe someone else can post a slightly larger case study
I used to use the compiler as you describe, but since I've been using ReSharper I do this a lot less.
Also, for things like rename, the refactoring support (both in Visual Studio 2005 upwards and, even better, from ReSharper) mean I don't have to do search + replace to rename things.
One thing you can take advantage of, especially in desktop apps, as I imagine you are dealing with coming from Delphi, is Edit and Continue. This lets you change actual code while you are running in debug mode. You can change just about anything, except for adding class level variables, methods, or new classes, and still continue running without having to recompile your project.
I use only the "Syntax Check" to see if I forgot some typo on the code... And these are much reduced, since I the "Code Proofreader" of GExperts plugin.
Well, compiler doesn't have to be that fast to take advantage of it. Some IDEs support incremental compilation on every file save, or either on-the-fly. This works great.
You can split application in several projects ( by layer and/or module and/or etc... ) and you will compile only project, where do you actualy work.
The last part of your post scares me. I am not familiar with other IDEs but MSDev allows you to find all references to a method - so you don't have to compile just to find all the method calls you broke.
Use whatever works, but it is good you are open to new ways of doing things.

Assembler library for .NET, assembling runtime-variable strings into machine code for injection

Is there such a thing as an x86 assembler that I can call through C#? I want to be able to pass x86 instructions as a string and get a byte array back. If one doesn't exist, how can I make my own?
To be clear - I don't want to call assembly code from C# - I just want to be able to assemble code from instructions and get the machine code in a byte array.
I'll be injecting this code (which will be generated on the fly) to inject into another process altogether.
As part of some early prototyping I did on a personal project, I wrote quite a bit of code to do something like this. It doesn't take strings -- x86 opcodes are methods on an X86Writer class. Its not documented at all, and has nowhere near complete coverage, but if it would be of interest, I would be willing to open-source it under the New BSD license.
UPDATE:
Ok, I've created that project -- Managed.X86
See this project:
https://github.com/ZenLulz/MemorySharp
This project wraps the FASM assembler, which is written in assembly and as a compiled as Microsoft coff object, wrapped by a C++ project, and then again wrapped in C#. This can do exactly what you want: given a string of x86/x64 assembly, this will produce the bytes needed.
If you require the opposite, there is a port of the Udis86 disassembler, fully ported to C#, here:
https://github.com/spazzarama/SharpDisasm
This will convert an array of bytes into the instruction strings for x86/x64
Take a look at Phoenix from Microsoft Research.
Cosmos also has some interesting support for generating x86 code:
http://www.gocosmos.org/blog/20080428.en.aspx
Not directly from C# you can't. However, you could potentially write your own wrapper class that uses an external assembler to compile code. So, you would potentially write the assembly out to a file, use the .NET Framework to spin up a new process that executes the assembler program, and then use System.IO to open up the generated file by the assembler to pull out the byte stream.
However, even if you do all that, I would be highly surprised if you don't then run into security issues. Injecting executable code into a completely different process is becoming less and less possible with each new OS. With Vista, I believe you would definitely get denied. And even in XP, I think you would get an access denied exception when trying to write into memory of another process.
Of course, that raises the question of why you are needing to do this. Surely there's got to be a better way :).
Take a look at this: CodeProject: Using unmanaged code and assembly in C#.
I think you would be best off writing a native Win32 dll. You can then write a function in assembler that is exported from the dll. You can then use C# to dynamically link to the dll.
This is not quite the same as passing in a string and returning a byte array. To do this you would need an x86 assembler component, or a wrapper around masm.exe.
i don't know if this is how it works but you could just shellexecute an external compiler then loading the object generated in your byte array.

Categories