Problems with text Encoding - c#

I read txt file in ANSI or UTF-8 Encoding. Txt file consists of such lines:
79005213750:hello
79005213751:привет
79005213752:серега
Read it with such code:
TextReader readFile = new StreamReader(file_path, Encoding.Default);
foreach (string line in ReadLineFromFile(readFile))
{}
private static IEnumerable<string> ReadLineFromFile(TextReader fileReader)
{
using (fileReader)
{
string currentLine;
while ((currentLine = fileReader.ReadLine()) != null)
{
yield return currentLine;
}
}
}
and after all manipulations with lines I save them:
SaveFileDialog saveFile1 = new SaveFileDialog();
saveFile1.DefaultExt = "*.txt";
saveFile1.Filter = "TXT Files|*.txt";
saveFile1.FileName = "rus_number-pass";
if (saveFile1.ShowDialog() == System.Windows.Forms.DialogResult.OK && saveFile1.FileName.Length > 0)
{
using (System.IO.StreamWriter file = new System.IO.StreamWriter(saveFile1.FileName))
foreach (string line in digits_ru)
{
file.WriteLine(line);
}
}
In out I receive:
79005213750:hello
79005213751:привет
79005213752:серега
But expect :
79005213750:hello
79005213751:привет
79005213752:серега
Can u help me? I killed 2 days on this problem, but can't solve it=\

I believe you are using one encoding (Encoding.Default) for Read operations and another one (UTF8) for writing.
Use different overload of System.IO.StreamWriter constructor. E.g. this one
public StreamWriter(string path, bool append, Encoding encoding)
and as the encoding parameter pass the same default encoding you pass into the Reader.
TextReader readFile = new StreamReader(file_path, Encoding.Default);
I think that after you'll see the expected characters in the output file.
Btw. be aware that using Encoding.Default is not recommended.

You can specify the target encoding by using the codepage number:
var encoding = Encoding.GetEncoding(1251);
Which is russian, thats what you need I presume.

Related

Replace a line in text file without creating another file [duplicate]

I have two text files, Source.txt and Target.txt. The source will never be modified and contain N lines of text. So, I want to delete a specific line of text in Target.txt, and replace by an specific line of text from Source.txt, I know what number of line I need, actually is the line number 2, both files.
I haven something like this:
string line = string.Empty;
int line_number = 1;
int line_to_edit = 2;
using StreamReader reader = new StreamReader(#"C:\target.xml");
using StreamWriter writer = new StreamWriter(#"C:\target.xml");
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
writer.WriteLine(line);
line_number++;
}
But when I open the Writer, the target file get erased, it writes the lines, but, when opened, the target file only contains the copied lines, the rest get lost.
What can I do?
the easiest way is :
static void lineChanger(string newText, string fileName, int line_to_edit)
{
string[] arrLine = File.ReadAllLines(fileName);
arrLine[line_to_edit - 1] = newText;
File.WriteAllLines(fileName, arrLine);
}
usage :
lineChanger("new content for this line" , "sample.text" , 34);
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). If your files are small then reading the entire target file into memory and then writing it out again might make sense. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2; // Warning: 1-based indexing!
string sourceFile = "source.txt";
string destinationFile = "target.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read the old file.
string[] lines = File.ReadAllLines(destinationFile);
// Write the new file over the old file.
using (StreamWriter writer = new StreamWriter(destinationFile))
{
for (int currentLine = 1; currentLine <= lines.Length; ++currentLine)
{
if (currentLine == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(lines[currentLine - 1]);
}
}
}
}
}
If your files are large it would be better to create a new file so that you can read streaming from one file while you write to the other. This means that you don't need to have the whole file in memory at once. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2;
string sourceFile = "source.txt";
string destinationFile = "target.txt";
string tempFile = "target2.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read from the target file and write to a new file.
int line_number = 1;
string line = null;
using (StreamReader reader = new StreamReader(destinationFile))
using (StreamWriter writer = new StreamWriter(tempFile))
{
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(line);
}
line_number++;
}
}
// TODO: Delete the old file and replace it with the new file here.
}
}
You can afterwards move the file once you are sure that the write operation has succeeded (no excecption was thrown and the writer is closed).
Note that in both cases it is a bit confusing that you are using 1-based indexing for your line numbers. It might make more sense in your code to use 0-based indexing. You can have 1-based index in your user interface to your program if you wish, but convert it to a 0-indexed before sending it further.
Also, a disadvantage of directly overwriting the old file with the new file is that if it fails halfway through then you might permanently lose whatever data wasn't written. By writing to a third file first you only delete the original data after you are sure that you have another (corrected) copy of it, so you can recover the data if the computer crashes halfway through.
A final remark: I noticed that your files had an xml extension. You might want to consider if it makes more sense for you to use an XML parser to modify the contents of the files instead of replacing specific lines.
When you create a StreamWriter it always create a file from scratch, you will have to create a third file and copy from target and replace what you need, and then replace the old one.
But as I can see what you need is XML manipulation, you might want to use XmlDocument and modify your file using Xpath.
You need to Open the output file for write access rather than using a new StreamReader, which always overwrites the output file.
StreamWriter stm = null;
fi = new FileInfo(#"C:\target.xml");
if (fi.Exists)
stm = fi.OpenWrite();
Of course, you will still have to seek to the correct line in the output file, which will be hard since you can't read from it, so unless you already KNOW the byte offset to seek to, you probably really want read/write access.
FileStream stm = fi.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
with this stream, you can read until you get to the point where you want to make changes, then write. Keep in mind that you are writing bytes, not lines, so to overwrite a line you will need to write the same number of characters as the line you want to change.
I guess the below should work (instead of the writer part from your example). I'm unfortunately with no build environment so It's from memory but I hope it helps
using (var fs = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite)))
{
var destinationReader = StreamReader(fs);
var writer = StreamWriter(fs);
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
destinationReader .ReadLine();
}
line_number++;
}
}
The solution works fine. But I need to change single-line text when the same text is in multiple places. For this, need to define a trackText to start finding after that text and finally change oldText with newText.
private int FindLineNumber(string fileName, string trackText, string oldText, string newText)
{
int lineNumber = 0;
string[] textLine = System.IO.File.ReadAllLines(fileName);
for (int i = 0; i< textLine.Length;i++)
{
if (textLine[i].Contains(trackText)) //start finding matching text after.
traced = true;
if (traced)
if (textLine[i].Contains(oldText)) // Match text
{
textLine[i] = newText; // replace text with new one.
traced = false;
System.IO.File.WriteAllLines(fileName, textLine);
lineNumber = i;
break; //go out from loop
}
}
return lineNumber
}

StreamReader not working as expected

I have written a simple utility that loops through all C# files in my project and updates the copyright text at the top.
For example, a file may look like this;
//Copyright My Company, © 2009-2010
The program should update the text to look like this;
//Copyright My Company, © 2009-2010
However, the code I have written results in this;
//Copyright My Company, � 2009-2011
Here is the code I am using;
public bool ModifyFile(string filePath, List<string> targetText, string replacementText)
{
if (!File.Exists(filePath)) return false;
if (targetText == null || targetText.Count == 0) return false;
if (string.IsNullOrEmpty(replacementText)) return false;
string modifiedFileContent = string.Empty;
bool hasContentChanged = false;
//Read in the file content
using (StreamReader reader = File.OpenText(filePath))
{
string file = reader.ReadToEnd();
//Replace any target text with the replacement text
foreach (string text in targetText)
modifiedFileContent = file.Replace(text, replacementText);
if (!file.Equals(modifiedFileContent))
hasContentChanged = true;
}
//If we haven't modified the file, dont bother saving it
if (!hasContentChanged) return false;
//Write the modifications back to the file
using (StreamWriter writer = new StreamWriter(filePath))
{
writer.Write(modifiedFileContent);
}
return true;
}
Any help/suggestions are appreciated. Thanks!
This is an encoing problem.
I think you should change this line
using (StreamWriter writer = new StreamWriter(filePath))
To a variant that saves with the correct encoding (the overload that looks like this)
using (StreamWriter writer = new StreamWriter(filePath, false, myEncoding))
To get the correct encoding, where you have opened the file add this line
myEncoding = reader.CurrentEncoding;
Try to use
StreamWriter(string path, bool append, Encoding encoding)
i.e.
new StreamWriter(filePath, false, new UTF8Encoding())
Get the Encoding from reader and use it in writer.
Changed code:
public bool ModifyFile(string filePath, List targetText, string replacementText)
{
if (!File.Exists(filePath)) return false;
if (targetText == null || targetText.Count == 0) return false;
if (string.IsNullOrEmpty(replacementText)) return false;
string modifiedFileContent = string.Empty;
bool hasContentChanged = false;
Encoding sourceEndocing = null;
using (StreamReader reader = File.OpenText(filePath))
{
sourceEndocing = reader.CurrentEncoding;
string file = reader.ReadToEnd();
foreach (string text in targetText)
modifiedFileContent = file.Replace(text, replacementText);
if (!file.Equals(modifiedFileContent))
hasContentChanged = true;
}
if (!hasContentChanged) return false;
using (StreamWriter writer = new StreamWriter(filePath, false, sourceEndocing))
{
writer.Write(modifiedFileContent);
}
return true;
}
You have to specify the encoding
System.Text.Encoding.UTF8 should do the trick.
Once you've sorted it please promise me to read this.
I'll bet it's related to the encoding of the file contents. Make sure you instantiate your StreamWriter with the correct encoding. ( http://msdn.microsoft.com/en-us/library/f5f5x7kt.aspx )

Delete specific line from a text file?

I need to delete an exact line from a text file but I cannot for the life of me workout how to go about doing this.
Any suggestions or examples would be greatly appreciated?
Related Questions
Efficient way to delete a line from a text file (C#)
If the line you want to delete is based on the content of the line:
string line = null;
string line_to_delete = "the line i want to delete";
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
if (String.Compare(line, line_to_delete) == 0)
continue;
writer.WriteLine(line);
}
}
}
Or if it is based on line number:
string line = null;
int line_number = 0;
int line_to_delete = 12;
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
line_number++;
if (line_number == line_to_delete)
continue;
writer.WriteLine(line);
}
}
}
The best way to do this is to open the file in text mode, read each line with ReadLine(), and then write it to a new file with WriteLine(), skipping the one line you want to delete.
There is no generic delete-a-line-from-file function, as far as I know.
One way to do it if the file is not very big is to load all the lines into an array:
string[] lines = File.ReadAllLines("filename.txt");
string[] newLines = RemoveUnnecessaryLine(lines);
File.WriteAllLines("filename.txt", newLines);
Hope this simple and short code will help.
List linesList = File.ReadAllLines("myFile.txt").ToList();
linesList.RemoveAt(0);
File.WriteAllLines("myFile.txt"), linesList.ToArray());
OR use this
public void DeleteLinesFromFile(string strLineToDelete)
{
string strFilePath = "Provide the path of the text file";
string strSearchText = strLineToDelete;
string strOldText;
string n = "";
StreamReader sr = File.OpenText(strFilePath);
while ((strOldText = sr.ReadLine()) != null)
{
if (!strOldText.Contains(strSearchText))
{
n += strOldText + Environment.NewLine;
}
}
sr.Close();
File.WriteAllText(strFilePath, n);
}
You can actually use C# generics for this to make it real easy:
var file = new List<string>(System.IO.File.ReadAllLines("C:\\path"));
file.RemoveAt(12);
File.WriteAllLines("C:\\path", file.ToArray());
This can be done in three steps:
// 1. Read the content of the file
string[] readText = File.ReadAllLines(path);
// 2. Empty the file
File.WriteAllText(path, String.Empty);
// 3. Fill up again, but without the deleted line
using (StreamWriter writer = new StreamWriter(path))
{
foreach (string s in readText)
{
if (!s.Equals(lineToBeRemoved))
{
writer.WriteLine(s);
}
}
}
Read and remember each line
Identify the one you want to get rid
of
Forget that one
Write the rest back over the top of
the file
I cared about the file's original end line characters ("\n" or "\r\n") and wanted to maintain them in the output file (not overwrite them with what ever the current environment's char(s) are like the other answers appear to do). So I wrote my own method to read a line without removing the end line chars then used it in my DeleteLines method (I wanted the option to delete multiple lines, hence the use of a collection of line numbers to delete).
DeleteLines was implemented as a FileInfo extension and ReadLineKeepNewLineChars a StreamReader extension (but obviously you don't have to keep it that way).
public static class FileInfoExtensions
{
public static FileInfo DeleteLines(this FileInfo source, ICollection<int> lineNumbers, string targetFilePath)
{
var lineCount = 1;
using (var streamReader = new StreamReader(source.FullName))
{
using (var streamWriter = new StreamWriter(targetFilePath))
{
string line;
while ((line = streamReader.ReadLineKeepNewLineChars()) != null)
{
if (!lineNumbers.Contains(lineCount))
{
streamWriter.Write(line);
}
lineCount++;
}
}
}
return new FileInfo(targetFilePath);
}
}
public static class StreamReaderExtensions
{
private const char EndOfFile = '\uffff';
/// <summary>
/// Reads a line, similar to ReadLine method, but keeps any
/// new line characters (e.g. "\r\n" or "\n").
/// </summary>
public static string ReadLineKeepNewLineChars(this StreamReader source)
{
if (source == null)
throw new ArgumentNullException(nameof(source));
char ch = (char)source.Read();
if (ch == EndOfFile)
return null;
var sb = new StringBuilder();
while (ch != EndOfFile)
{
sb.Append(ch);
if (ch == '\n')
break;
ch = (char)source.Read();
}
return sb.ToString();
}
}
Are you on a Unix operating system?
You can do this with the "sed" stream editor. Read the man page for "sed"
What?
Use file open, seek position then stream erase line using null.
Gotch it? Simple,stream,no array that eat memory,fast.
This work on vb.. Example search line culture=id where culture are namevalue and id are value and we want to change it to culture=en
Fileopen(1, "text.ini")
dim line as string
dim currentpos as long
while true
line = lineinput(1)
dim namevalue() as string = split(line, "=")
if namevalue(0) = "line name value that i want to edit" then
currentpos = seek(1)
fileclose()
dim fs as filestream("test.ini", filemode.open)
dim sw as streamwriter(fs)
fs.seek(currentpos, seekorigin.begin)
sw.write(null)
sw.write(namevalue + "=" + newvalue)
sw.close()
fs.close()
exit while
end if
msgbox("org ternate jua bisa, no line found")
end while
that's all..use #d

C# Cannot implicitly convert type 'string' to 'System.IO.StreamReader

What i'm trying to do is open every text file in a directory, read it line by line, and if it matches a specific content, do a regex and output it to a result. For some reason my text files ended up being in unicode...., not sure dont know why. So I was able to work around that but I cant work around the stream reader issue i'm having. If someone could suggest a way to work around this it would be great, and if that way is to convert those text files, so be it.
heres the code:
public void doSomeWork()
{
DirectoryInfo dinfo = new DirectoryInfo(#"C:\Documents and Settings\123");
FileInfo[] Files = dinfo.GetFiles("*.txt");
foreach (FileInfo filex in Files)
{
string line;
StreamReader sr = File.ReadAllText(filex.FullName, Encoding.Unicode);
StreamWriter sra = File.AppendText(#"C:\sharename.txt");
int counter = 0;
while((line = sr.ReadLine()) != null)
{
string matchingcontants = "Share";
if (line.Contains(matchingcontants))
{
string s = sr.ReadLine();
string sharename = Regex.Match(line, #"\+(\S*)(.)(.*)(.)").Groups[3].Value;
sra.WriteLine(sharename);
}
counter++;
}
sr.Close();
sra.Close();
}
File.ReadAllText actually reads the whole file into a string. Try File.OpenRead instead
File.ReadAllText returns a string containing all the text in the file.
Try changing:
StreamReader sr = File.ReadAllText(filex.FullName, Encoding.Unicode);
To
string[] lines = File.ReadAllLines(filex.FullName, Encoding.Unicode);
And changing
while((line = sr.ReadLine()) != null)
To
foreach(string line in lines)
And remove the following:
sr.Close();

Remove Byte Order Mark from a File.ReadAllBytes (byte[])

I have an HTTPHandler that is reading in a set of CSS files and combining them and then GZipping them. However, some of the CSS files contain a Byte Order Mark (due to a bug in TFS 2005 auto merge) and in FireFox the BOM is being read as part of the actual content so it's screwing up my class names etc. How can I strip out the BOM characters? Is there an easy way to do this without manually going through the byte array looking for ""?
Expanding on Jon's comment with a sample.
var name = GetFileName();
var bytes = System.IO.File.ReadAllBytes(name);
System.IO.File.WriteAllBytes(name, bytes.Skip(3).ToArray());
Expanding JaredPar sample to recurse over sub-directories:
using System.Linq;
using System.IO;
namespace BomRemover
{
/// <summary>
/// Remove UTF-8 BOM (EF BB BF) of all *.php files in current & sub-directories.
/// </summary>
class Program
{
private static void removeBoms(string filePattern, string directory)
{
foreach (string filename in Directory.GetFiles(directory, file Pattern))
{
var bytes = System.IO.File.ReadAllBytes(filename);
if(bytes.Length > 2 && bytes[0] == 0xEF && bytes[1] == 0xBB && bytes[2] == 0xBF)
{
System.IO.File.WriteAllBytes(filename, bytes.Skip(3).ToArray());
}
}
foreach (string subDirectory in Directory.GetDirectories(directory))
{
removeBoms(filePattern, subDirectory);
}
}
static void Main(string[] args)
{
string filePattern = "*.php";
string startDirectory = Directory.GetCurrentDirectory();
removeBoms(filePattern, startDirectory);
}
}
}
I had need that C# piece of code after discovering that the UTF-8 BOM corrupts file when you try to do a basic PHP download file.
var text = File.ReadAllText(args.SourceFileName);
var streamWriter = new StreamWriter(args.DestFileName, args.Append, new UTF8Encoding(false));
streamWriter.Write(text);
streamWriter.Close();
Another way, assuming UTF-8 to ASCII.
File.WriteAllText(filename, File.ReadAllText(filename, Encoding.UTF8), Encoding.ASCII);
For larger file, use the following code; memory efficient!
StreamReader sr = new StreamReader(path: #"<Input_file_full_path_with_byte_order_mark>",
detectEncodingFromByteOrderMarks: true);
StreamWriter sw = new StreamWriter(path: #"<Output_file_without_byte_order_mark>",
append: false,
encoding: new UnicodeEncoding(bigEndian: false, byteOrderMark: false));
var lineNumber = 0;
while (!sr.EndOfStream)
{
sw.WriteLine(sr.ReadLine());
lineNumber += 1;
if (lineNumber % 100000 == 0)
Console.Write("\rLine# " + lineNumber.ToString("000000000000"));
}
sw.Flush();
sw.Close();

Categories