Is it possible to convert an array of strings into one string? - c#

In my program, I read in a file using this statement:
string[] allLines = File.ReadAllLines(dataFile);
But I want to apply a Regex to the file as a whole (an example file is shown at the bottom) so I can eliminate certain stuff I'm not concerned with in the file. I can't use ReadAllText as I need to read it line by line for another purpose of the program (removing whitespace from each line).
Regex r = new Regex(#"CREATE TABLE [^\(]+\((.*)\) ON");
(thanks to chiccodoro for this code)
This is the Regex that I want to apply.
Is there any way to change the array back into one text file? Or any other solution to the problem?
Things that pop into my mind is replacing the 'stuff' that I'm not concerned with with string.Empty.
example file
USE [Shelleys Other Database]
CREATE TABLE db.exmpcustomers(
f_name varchar(100) NULL,
l_name varchar(100) NULL,
date_of_birth date NULL,
house_number int NULL,
street_name varchar(100) NULL
) ON [PRIMARY]

You can join string[] into a single string like this
string strmessage=string.Join(",",allLines);
output :-a single , separated string.

You can use String.Join():
string joined = String.Join(Environment.NewLine, allLines);
If you just want to write it back to the file, you can use File.WriteAllLines() and that works with an array.

String.Join will concatenate all the members of your array using any specified seperator.

It's going to be really hard to use regexen to deal with multi-line data a line at a time. So rather than muck about with that, I'm going to suggest that you first read it as one big string, do your multi-line regex business, and then you can split it into an array of strings using String.Split (split on newlines). The reason you want to do it in this order is so that any further operations on your file data will include the changes already made by the regex. If you join the strings, then do the regex, you will either have to split that string again, or lose the changes you've made to it while you operate on the original array.
Remember to use this for your regex matching, so that it will match across newlines:
Regex r = new Regex(#"CREATE TABLE [^(]+((.*)) ON", RegexOptions.SingleLine);

Just change from
string[] allLines = File.ReadAllLines(dataFile);
to
string allLines = File.ReadAllText(dataFile);
;)

Could you build up a buffer as you read in each line? I have the idea that this might be a bit more efficient than getting all the lines as a string array, then joining them (...though I haven't done a full study of the issue and would be interested to hear if there is some reason that it is actually more efficient to go that way).
StringBuilder buffer = new StringBuilder();
string line = null;
using (StreamReader sr = new StreamReader(dataFile))
{
while((line = sr.ReadLine()) != null)
{
// Do whatever you need to do with the individual line...
// ...then append the line to your buffer.
buffer.Append(line);
}
}
// Now, you can do whatever you need to do with the contents of
// the buffer.
string wholeText = buffer.ToString();

public string CreateStringFromArray(string[] allLines)
{
StringBuilder builder = new StringBuilder();
foreach (string item in allLines)
{
builder.Append(item);
//Appending Linebreaks
builder.Append("\n\l");
}
return builder.ToString();
}

Related

How can I find and replace text in a larger file (150MB-250MB) with regular expressions in C#?

I am working with files that range between 150MB and 250MB, and I need to append a form feed (/f) character to each match found in a match collection. Currently, my regular expression for each match is this:
Regex myreg = new Regex("ABC: DEF11-1111(.*?)MORE DATA(.*?)EVEN MORE DATA(.*?)\f", RegexOptions.Singleline);
and I'd like to modify each match in the file (and then overwrite the file) to become something that could be later found with a shorter regular expression:
Regex myreg = new Regex("ABC: DEF11-1111(.*?)\f\f, RegexOptions.Singleline);
Put another way, I want to simply append a form feed character (\f) to each match that is found in my file and save it.
I see a ton of examples on stack overflow for replacing text, but not so much for larger files. Typical examples of what to do would include:
Using streamreader to store the entire file in a string, then do a
find and replace in that string.
Using MatchCollection in combination
with File.ReadAllText()
Read the file line by line and look for
matches there.
The problem with the first two is that is just eats up a ton of memory, and I worry about the program being able to handle all of that. The problem with the 3rd option is that my regular expression spans over many rows, and thus will not be found in a single line. I see other posts out there as well, but they cover replacing specific strings of text rather than working with regular expressions.
What would be a good approach for me to append a form feed character to each match found in a file, and then save that file?
Edit:
Per some suggestions, I tried playing around with StreamReader.ReadLine(). Specifically, I would read a line, see if it matched my expression, and then based on that result I would write to a file. If it matched the expression, I would write to the file. If it didn't match the expression, I would just append it to a string until it did match the expression. Like this:
Regex myreg = new Regex("ABC: DEF11-1111(.?)MORE DATA(.?)EVEN MORE DATA(.*?)\f", RegexOptions.Singleline);
//For storing/comparing our match.
string line, buildingmatch, match, whatremains;
buildingmatch = "";
match = "";
whatremains = "";
//For keep track of trailing bits after our match.
int matchlength = 0;
using (StreamWriter sw = new StreamWriter(destFile))
using (StreamReader sr = new StreamReader(srcFile))
{
//While we are still reading lines in the file...
while ((line = sr.ReadLine()) != null)
{
//Keep adding lines to buildingmatch until we can match the regular expression.
buildingmatch = buildingmatch + line + "\r\n";
if (myreg.IsMatch(buildingmatch)
{
match = myreg.Match(buildingmatch).Value;
matchlength = match.Lengh;
//Make sure we are not at the end of the file.
if (matchlength < buildingmatch.Length)
{
whatremains = buildingmatch.SubString(matchlength, buildingmatch.Length - matchlength);
}
sw.Write(match, + "\f\f");
buildingmatch = whatremains;
whatremains = "";
}
}
}
The problem is that this took about 55 minutes to run a roughly 150MB file. There HAS to be a better way to do this...
If you can load the whole string data into a single string variable, there is no need to first match and then append text to matches in a loop. You can use a single Regex.Replace operation:
string text = File.ReadAllText(srcFile);
using (StreamWriter sw = new StreamWriter(destfile, false, Encoding.UTF8, 5242880))
{
sw.Write(myregex.Replace(text, "$&\f\f"));
}
Details:
string text = File.ReadAllText(srcFile); - reads the srcFile file to the text variable (match would be confusing)
myregex.Replace(text, "$&\f\f") - replaces all occurrences of myregex matches with themselves ($& is a backreference to the whole match value) while appending two \f chars right after each match.
I was able to find a solution that works in a reasonable time; it can process my entire 150MB file in under 5 minutes.
First, as mentioned in the comments, it's a waste to compare the string to the Regex after every iteration. Rather, I started with this:
string match = File.ReadAllText(srcFile);
MatchCollection mymatches = myregex.Matches(match);
Strings can hold up to 2GB of data, so while not ideal, I figured roughly 150MB worth wouldn't hurt to be stored in a string. Then, as opposed to checking a match every x amount of lines read in from the file, I can check the file for matches all at once!
Next, I used this:
StringBuilder matchsb = new StringBuilder(134217728);
foreach (Match m in mymatches)
{
matchsb.Append(m.Value + "\f\f");
}
Since I already know (roughly) the size of my file, I can go ahead and initialize my stringbuilder. Not to mention, it's a lot more efficient to use string builder if you are doing multiple operations on a string (which I was). From there, it's just a matter of appending the form feed to each of my matches.
Finally, the part the cost the most on performance:
using (StreamWriter sw = new StreamWriter(destfile, false, Encoding.UTF8, 5242880))
{
sw.Write(matchsb.ToString());
}
The way that you initialize StreamWriter is critical. Normally, you just declare it as:
StreamWriter sw = new StreamWriter(destfile);
This is fine for most use cases, but the problem becomes apparent with you are dealing with larger files. When declared like this, you are writing to the file with a default buffer of 4KB. For a smaller file, this is fine. But for 150MB files? This will end up taking a long time. So I corrected the issue by changing the buffer to approximately 5MB.
I found this resource really helped me to understand how to write to files more efficiently: https://www.jeremyshanks.com/fastest-way-to-write-text-files-to-disk-in-c/
Hopefully this will help the next person along as well.

string.contains and string.replace in one single line of code

I'm currently writing some software where I have to load a lot of columnnames from an external file. Usually I would do this with some JSON but for reasons of userfriendlyness I can't do it right now. I need to use a textfile which is readable to the users and includes a lot of comments.
So I have created my own file to hold all of these values.
Now when I'm importing these values in my software I essentially run through my configfile line by line and I check for every line if it matches a parameter which I then parse. But this way I end up with a big codeblock with very repetitive code and I was wondering is could not simplify it in a way so that every check is done in just one line.
Here is the code I'm currently using:
if (line.Contains("[myValue]"))
{
myParameter = line.Replace("[myValue]", string.Empty).Trim();
}
I know that using Linq you can simply things and put them in one single line, I'm just not sure if it would work in this case?
Thanks for your help!
Kenneth
Why not just create a method if this piece of code often repeated :
void SetParameter(string line, string name, ref string parameter)
{
if (line.Contains(name))
{
parameter = line.Replace(name, string.Empty).Trim();
}
}
SetParameter(line, "[myValue]", ref myParameter);
If you want to avoid calling both Replace and Contains, which is probably a good idea, you could also just call Replace:
void SetParameter(string line, string name, ref string parameter)
{
var replaced = line.Replace(name, string.Empty);
if (line != replaced)
{
parameter = replaced.Trim();
}
}
Try this way (ternary):
myParameter = line.Contains("[myValue]")?line.Replace("[myValue]", string.Empty).Trim():myParameter;
Actually,
line.IndexOf should be faster.
From your code, look like you are replacing with just empty text, so why not take the entire string (consisting of many lines) and replace at one shot, instead of checking one line at a time.
You could use RegEx. This might possibly relieve you of some repetitive code
string line = "[myvalue1] some string [someotherstring] [myvalue2]";
// All your Keys stored at a single place
string[] keylist = new string[] { #"\[myvalue1]", #"\[myvalue2]" };
var newString = Regex.Replace(line, string.Join("|", keylist), string.Empty);
Hope it helps.

Read a CSV file and writer into a file without " " using C#

I am trying to read a CSV file and stored all the values in the single list.CSV file contains credentials as uid(userid) and pass(password) and separated by','I have successfully read all the lines and write it in the file.but when it writes in the file, it write the value in between " "(double quotes) like as("abcdefgh3 12345678")what i want actually to remove this "" double quotes sign when i write it in to the files.i am pasting my code here:
static void Main(string[] args)
{
var reader = new StreamReader(File.OpenRead(#"C:\Desktop\userid1.csv"));
List<string> listA = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
listA.Add(values[0]);
listA.Add(values[1]);
}
foreach (string a in listA)
{
TextWriter tr = new StreamWriter(#"E:\newfiless",true);
tr.Write(a);
tr.Write(tr.NewLine);
tr.Close();
}
}
and the resulted output is like this:
"uid
pass"
"Martin123
123456789"
"Damian
91644"
but i want in this form:
uid
pass
Martin123
123456789
Damian
91644
Thanking you all in advance.
The original file clearly has quotes, which makes it a CSV file with only one colum and in that column there are two values. Not usual, but it happens.
To actually remove quotes you can use Trim, TrimEnd or TrimStart.
You can remove the quotes while reading, or while writing, in this case it doesn't really matter.
var line = reader.ReadLine().Trim('"');
This will remove the quotes while reading. Note that this assumes the CSV is of this "broken" variant.
tr.WriteLine(a.Trim('"'));
This will handle it on write. This will work even if the file is "correct" CSV having two columns and values in quotes.
Note that you can use WriteLine to add the newline, no need for two Write calls.
Also as others have commented, don't create a TextWriter in the loop for every value, create it once.
using (TextWriter tr = new StreamWriter(#"E:\newfiless"))
{
foreach (string a in listA)
{
tr.WriteLine(a.Trim('"'));
}
}
The using will take care of closing the file and other possible resources even if there is an exception.
I assume that all you need to read the input file, strip out all starting/ending quotation marks, then split by comma and write it all to another file. You can actually accomplish it in a one-liner using SelectMany, which will produce a "flat" collection:
File.WriteAllLines(
#"c:\temp\output.txt",
File
.ReadAllLines(#"c:\temp\input.csv")
.SelectMany(line => line.Trim('"').Split(','))
);
It's not quite clear from your example where quotation marks are located in the file. For a typical .CSV file some comma-separated field might be wrapped in quotation marks to allow commas to be a part of the content. If it's the case, then parsing will be more complex.
You can use
tr.Write(a.Substring(1, line.Length - 2));
Edited
Please use Trim
tr.Write(a.TrimEnd('"').TrimStart('"'));

Splitting strings using Environment.Newline leaves \n in most array items?

I used MyString.Split(Environment.Newline.ToCharArray()[0]) to split my string from a file into different pieces. But, every item in the array, except the first one starts with \n after I did that? I know the way that I'm splitting by newlines is kind of "cheaty" for lack of a better word, so if there is a better way of doing this, please tell me...
Here is the file...
If you are wanting to maintain using the .Split() instead of reading a file in a line at a time you can do...
var splitResult = MyString.Split( new string[]{ System.Environment.NewLine },
System.StringSplitOptions.RemoveEmptyEntries );
/* or System.StringSplitOptions.None if you want empty results as well */
EDIT:
The problem you were having is that in a non-unix environment the new-line "character" is actually two characters. So when you grabbed the zero index you were actually splitting on a carriage return...not the new-line character (\n).
Windows = "\r\n"
Unix = "\n"
Per http://msdn.microsoft.com/en-us/library/system.environment.newline.aspx
A newline in Windows is two characters (\r and \n). The Environment.Newline.ToCharArray()[0] expression specifies only one of those characters: \r. Therefore, the other character (\n) remains as a portion of the split string.
My I suggest you read your file using something like this:
public IEnumerable<string> ReadFile(string filePath)
{
using (StreamReader rdr = new StreamReader(filePath))
{
string line;
while ( (line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
You might need more error handling, or to specify different file open option, or to pass a stream to method rather than the path, but the idea of using an iterator over the ReadLine() method is sound. The result is you can just use code like this:
foreach (string line in ReadLine(" ... my file path ... "))
{
}

Replacing each letter of the alphabet in a string?

That's what I've written so far:
string omgwut;
omgwut = textBox1.Text;
omgwut = omgwut.Replace(" ", "snd\\space.wav");
omgwut = omgwut.Replace("a", "snd\\a.wav");
Now, the problem is that this code would turn
"snd\space.wav"
into
"snd\spsnd\a.wavce.wsnd\a.wavv"
in line four. Not what I'd want! Now I know I'm not good at C#, so that's why I'm asking.
Solutions would be great! Thanks!
You'll still need to write the getSoundForChar() function, but this should do what you're asking. I'm not sure, though, that what you're asking will do what you want, i.e., play the sound for the associated character. You might be better off putting them in a List<string> for that.
StringBuilder builder = new StringBuilder();
foreach (char c in textBox1.Text)
{
string sound = getSoundForChar( c );
builder.Append( sound );
}
string omgwut = builder.ToString();
Here's a start:
public string getSoundForChar( char c )
{
string sound = null;
if (sound == " ")
{
sound = "snd\\space.wav";
}
... handle other special characters
else
{
sound = string.Format( "snd\\{0}.wav", c );
}
return sound;
}
The problem is that you are doing multiple passes of the data. Try just stepping through the characters of the string in a loop and replacing each 'from' character by its 'to' string. That way you're not going back over the string and re-doing those characters already replaced.
Also, create a separate output string or array, instead of modifying the original. Ideally use a StringBuilder, and append the new string (or the original character if not replacing this character) to it.
I do not know of a way to simultaneously replace different characters in C#.
You could loop over all characters and build a result string from that (use a stringbuilder if the input string can be long). For each character, you append its replacement to the result string(builder).
But what are you trying to do? I cannot think of a useful application of appending file paths without any separator.

Categories