Linq, how get properly string with convert - c#

I have got problem with getting string from the file via linq.
My file is:
LANG_FORM="nnd documents acceptance"
%>
Response.Write "<SCRIPT LANGUAGE=javascript>alert('" & LN("KtśćóŻ") & "');</SCRIPT>"
it is part of asp file but now it is doesn't matter.
I have to get value in LN function.
I write linq synatx like:
var LN = from place in File.ReadAllLines(item.file)
where Regex.IsMatch(place, pattern)
select new { place };
In debug view i have non properly output:
{ place = Response.Write "<SCRIPT LANGUAGE=javascript>alert('" & LN("Kt���") & "');</SCRIPT>" }
My question is, how prepare linq syntax to get output properly (they are polish letters)?

I think encoding is wrong.
Try File.ReadAllLines(String, Encoding).
var LN = from place in File.ReadAllLines(item.file, Encoding.UTF8)
where Regex.IsMatch(place, pattern)
select new { place };
You can use desired encoding, not only Encoding.UTF8.

Related

get the name of a file without empty spaces

I´m working in a MVC project and receive a file(HttpPostedFileBase property) in my controller via modelbinding and what I want is to delete all the empty spaces in the name of the file I just received, for that purpose I use this
var nombre_archivo = DateTime.Now.ToString("yyyyMMddHHmm.") +"_"+ (info.file.FileName.ToString()).Split(new[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
but the var "nombre_archivo" is always: 201801240942.System.String[] and what I want is 201801240942.nameOfFile, could you please tell me where is the error?
Your are splitting on an array of dots.
Use replace instead :
var nombre_archivo = string.Format("{0}_{1}",
DateTime.Now.ToString("yyyyMMddHHmm."),
info.file.FileName.replace(" ", "")
);
Moreover, we recommend to use string.Format instead of + concatenation. It's faster and clearer
var name = $"{DateTime.Now.ToString("yyyyMMddHHmm.")}_{info.file.FileName.Replace(" ", "")}";
You can use String.Replace to replace a space with an empty string. The method has other issues though. It doesn't check whether FileName is valid which means someone could make a POST request with a hand-coded path like ../../ or E:\somepath\myinnocentprogram.exe to write a file to the server's disk. Or worse, ../index.htm.
Replacing spaces doesn't make much sense. It's the dots and slashes that can result
If you check Uploading a File (Or Files) With ASP.NET MVC you'll see that the author uses Path.GetFileName() to retrieve only the file's name before saving it in the proper folder. Your code should look like this::
[HttpPost]
public ActionResult Index(HttpPostedFileBase file) {
if (file.ContentLength > 0) {
var fileName = Path.GetFileName(file.FileName)
.Replace(" ","");
var finalName=String.Format("{0:yyyyMMddHHmm}._{1}",DateTime.Now,fileName);
var path = Path.Combine(Server.MapPath("~/App_Data/uploads"), finalName);
file.SaveAs(path);
}
return RedirectToAction("Index");
}
This ensures that only the filename part of the file is used, and that the file is saved in the appropriate folder, even if someone posted an invalid path
This is how you should be able to fix your problem
string nombre_archivo = string.Format("{0}_{1}", DateTime.Now.ToString("yyyyMMddHHmm."), info.file.FileName.Where(c => !Char.IsWhiteSpace(c))
EDIT :
You should use string.Replace instead of using Linq query.
There is a format problem I did not expect.
So, the right answer is given below, but basically, it would look like :
string nombre_archivo = string.Format("{0}_{1}", DateTime.Now.ToString("yyyyMMddHHmm."), info.file.FileName.Replace(" ", ""));

Asp.net, Syntax highlight code from file with google prettify

In my page I will get the ID from link parameters, with that ID I will search the database for the file path, after reading the file and storing its contents I want to put its contents inside my <pre> tag... So I will have a literal in which the text for it will be:
Code.Text = "<pre>" + File Contents in string + "</pre>";
My question is how will I insert the contents there if I need to read the file line by line into an string array, unless I read it all into one string, BUT that will make the text look like one huge line in the page.
Also, is it going to conflict with literal syntax(?) definitions, since for quotes we have to do \" instead of " ...?
If you are working with literal control, you use the StringBiulder And Append properties becouse It let you put any HTML code from code behind.
Something like:
//Declare your String Builder
private StringBuilder stb = new StringBuilder();
And also you could have any proccess when you read the file, and split it by any char like \n
string readFile = //Any Method that you read you file string.
string[] tokens = readFile.Split('\n');
stb.Append("<pre>");
foreach (string s in tokens)
{
stb.Append( s + "<\br>");
}
stb.Append("</pre>");
finally you attach the Stringbuilder value to you Literal
YourLiteral.Text = stb.ToString();
I hope that help, and you won't have the value in one line. And remember the carring return need be in the string file to the split works.
Cheers

Why doesn't razorengine render this as raw?

I am creating data driven xaml using RazorEngine.
However, I cannot get this to work:
string template = "<k t=\" #Raw(Model.Name) \" ";
var model = new { Name = "CS
" };
string result = Razor.Parse(template, model);
this causes "result" to become
<k t="CS&#10; "
I do not want the "&" to be turned into
#amp;
If I remove any of the following:
the starting character "<"
the space between "k" and "t"
the \"
....then the razor engine parser Raw() function is behaving correctly by not converting "&" into "&"
I was also thinking that I could help Razor understand my intent better by using a code block #{} instead of just #.
However, I haven't figured how I can make a code block emit text to Razor output.
You don't need to do anything & is actually &.
if you try this code you will see that it outputs &
var str = HttpUtility.HtmlDecode("&")
See http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references and http://en.wikipedia.org/wiki/Character_encodings_in_HTML for more info.

extract text from <p>...</p> tag or directly from an HTML file

I have an HTML page that contains some filenames that i want to download from a webserver.
I need to read these filenames in order to create a list that will be passed to my web application that downloads the file from the server. These filenames have some extention.
I have digged about this topic but havn't fount anything except -
Regex cannt be used to parse HTML.
Use HTML Agility Pack
Is there no other way so that i can search for text that have pattern like filename.ext from an HTML file?
Sample HTML that contains filename -
<p class=3DMsoNormal style=3D'margin-top:0in;margin-right:0in;margin-bottom=:0in; margin-left:1.5in;margin-bottom:.0001pt;text-indent:-.25in;line-height:normal;mso-list:l1 level3 lfo8;tab-stops:list 1.5in'><![if !supportLists]> <span style=3D'font-family:"Times New Roman","serif";mso-fareast-font-family:"Times New Roman"'><span style=3D'mso-list:Ignore'>1.<span style=3D'font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><span style=3D'font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman"'>**13572_PostAccountingReport_2009-06-03.acc**<o:p></o:p></span></p>
I cant use HTML Agility Pack because I m not allowed to download and make use of any application or tool.
Cant this be achieved by anyother logic?
This is what i have done so far
string pageSource = "";
string geturl = #"C:\Documents and Settings\NASD_Download.mht";
WebRequest getRequest = WebRequest.Create(geturl);
WebResponse getResponse = getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream()))
{
pageSource = sr.ReadToEnd();
pageSource.Replace("=", "");
}
var fileNames = from Match m in Regex.Matches(pageSource, #"[0-9]+_+[A-Za-z]+_+[0-9]+-+[0-9]+-+[0-9]+.+[a-z]")
select m.Value;
foreach (var s in fileNames)
Response.Write(s);
Bcause of some "=" occuring in every file name i m not able to get the filename. how can I remove the occurrence of "=" in pageSource string
Thanks in advance
Akhil
Well, knowing that regex aren't ideal to find values in HTML:
var files = [];
var p = document.getElementsByTagName('p');
for (var i = 0; i < p.length; i++){
var match = p[i].innerHTML.match(/\s(\S+\.ext)\s/)
if (match)
files.push(match[1]);
}
Live DEMO
Note:
Read the comments to the question.
If the extension can be anything, you can use this:
var files = [];
var p = document.getElementsByTagName('p');
for (var i = 0; i < p.length; i++){
var match = p[i].innerHTML.match(/\b(\S+\.\S+)\b/)
console.log(match)
if (match)
files.push(match[1]);
}
document.getElementById('result').innerHTML = files + "";
​
But this really really not reliable.
Live DEMO
Well, you can use regular expressions to extract stuff that looks like file names. Since, as you correctly point out, regular expressions do not parse HTML, you might get false positives, i.e., you might get results that look like file names but are not.
Let's take an example:
string html = #"<p class=3DMsoNormal ...etc...";
var fileNames = from Match m in Regex.Matches(html, #"\b[A-Za-z0-9_-]+\.[A-Za-z0-9_-]{3}\b")
select m.Value;
foreach (var s in fileNames)
Console.WriteLine(s);
Console.ReadLine();
This will return
1.5in
1.5in
7.0pt
13572_PostAccountingReport_2009-06-03.acc
You see, HTML stuff that looks like a file name will be returned. Of course, you could refine the regular expression (for example, replace + with {3,}, so that at least three characters are required for the part before the dot) so that the false positives in this example are filtered out. Still, it's always going to be an approximate result, not an exact one.
It may be impossible to get file names using common pattern because of 1.5in -.25in 7.0pt and the likes, try to be more specific (if possible), like
/[a-z0-9_-]+\.[a-z]+/gi or
/>[a-z0-9_-]+\.[a-z]+</gi (markup included) or even
/>\d+_PostAccountingReport_\d+-\d+-\d+\.[a-z]+</gi

Find/parse server-side <?abc?>-like tags in html document

I guess I need some regex help. I want to find all tags like <?abc?> so that I can replace it with whatever the results are for the code ran inside. I just need help regexing the tag/code string, not parsing the code inside :p.
<b><?abc print 'test' ?></b> would result in <b>test</b>
Edit: Not specifically but in general, matching (<?[chars] (code group) ?>)
This will build up a new copy of the string source, replacing <?abc code?> with the result of process(code)
Regex abcTagRegex = new Regex(#"\<\?abc(?<code>.*?)\?>");
StringBuilder newSource = new StringBuilder();
int curPos = 0;
foreach (Match abcTagMatch in abcTagRegex.Matches(source)) {
string code = abcTagMatch.Groups["code"].Value;
string result = process(code);
newSource.Append(source.Substring(curPos, abcTagMatch.Index));
newSource.Append(result);
curPos = abcTagMatch.Index + abcTagMatch.Length;
}
newSource.Append(source.Substring(curPos));
source = newSource.ToString();
N.B. I've not been able to test this code, so some of the functions may be slightly the wrong name, or there may be some off-by-one errors.
var new Regex(#"<\?(\w+) (\w+) (.+?)\?>")
This will take this source
<b><?abc print 'test' ?></b>
and break it up like this:
Value: <?abc print 'test' ?>
SubMatch: abc
SubMatch: print
SubMatch: 'test'
These can then be sent to a method that can handle it differently depending on what the parts are.
If you need more advanced syntax handling you need to go beyond regex I believe.
I designed a template engine using Antlr but thats way more complex ;)
exp = new Regex(#"<\?abc print'(.+)' \?>");
str = exp.Replace(str, "$1")
Something like this should do the trick. Change the regexes how you see fit

Categories