I have 200.000 json files on file system.
Deserializing them one-by-one and putting them in a List takes about 4 minutes.
I am looking for fastest way to deserialize them or a way to deserialize them all at once.
Code Sample
The code i am using is somthing like this:
var files = Directory.GetFiles(#"C:\Data","*.json");
var list = new List<ParsedData>();
var dt1 = DateTime.Now;
foreach(var file in files)
{
using (StreamReader filestr = File.OpenText(file))
{
JsonSerializer serializer = new JsonSerializer();
var data= (ParsedData)serializer.Deserialize(filestr, typeof(ParsedData));
list.Add(data);
}
}
var dt2 = DateTime.Now;
Console.WriteLine((dt2 - dt1).TotalMilliseconds);
JSON format
And the json sample is:
{
"channel_name": "#channel",
"message": "",
"text": "",
"date": "2015/10/09 12:22:48",
"views": "83810",
"forwards": "0",
"raw_text": "",
"keywords_marked": "",
"id": 973,
"media": "1.jpg"
}
You can trying using a Parallel.Foreach():
var files = Directory.GetFiles(#"C:\Data", "*.json");
var list = new ConcurrentBag<ParsedData>();
var dt1 = DateTime.Now;
Parallel.ForEach(files, (file) =>
{
var filestr = File.ReadAllText(file);
var data = JsonSerializer.Deserialize<ParsedData>(filestr);
list.Add(data);
});
var dt2 = DateTime.Now;
Console.WriteLine((dt2 - dt1).TotalMilliseconds);
EDIT:
Remove var files = Directory.GetFiles(#"C:\Data", "*.json"); and then try directly:
Parallel.ForEach(Directory.EnumerateFiles(#"C:\Data", "*.json"), (file) =>
{
var filestr = File.ReadAllText(file);
var data = JsonSerializer.Deserialize<ParsedData>(filestr);
list.Add(data);
});
But with 200000 files 50sec seems pretty descent.
If you use .NET6, you may use:
Parallel.ForEachAsync( ... async(file) => {
var fs = new FileStream(file, FileMode.Open);
var data = await JsonSerializer.DeserializeAsync<ParsedData>(fs);
list.Add(data);
});
Related
I need to insert a new piece of data in a text file.
This is the method I use to read the text file:
try
{
var path = #"text file\\GetAllEmp.txt";
string rawJson = File.ReadAllText(path, Encoding.UTF8);
ObservableCollection<EmployeeItem> Employee = new ObservableCollection<EmployeeItem>();
var jsonData = JsonConvert.SerializeObject(rawJson);
List<EmployeeItem> emp = JsonConvert.DeserializeObject<List<EmployeeItem>>(rawJson);
listitem.ItemsSource = emp;
I just need to add new data in the text file.
How to add data?
What I have tried is:
public static void Writeemployee()
{
var path = #"text file\\GetAllEmp.txt";
string rawJson = File.ReadAllText(path);
List<EmployeeItem> emp = JsonConvert.DeserializeObject<List<EmployeeItem>>(rawJson);
var abs = emp;
for (int i = 0; i < abs.Count; i++)
{
EmployeeItem s_Item = new EmployeeItem();
int SID = ((int)s_Item.SiteID);
DataAccess.AddEmployee(s_Item);
}
}
My data access:
public static async void AddEmployeee(EmployeeItem Employee)
{
}
I just don't know how to insert. If there is any other method to insert, please let me know.
Using the file APIs in UWP cannot add items to the Json file without deleting the original items.
Because of the format of the Json file, items need to be placed in [{items1},{items2 }], so you need to read all the items, then add new elements, convert the list to Json format and write it to the file.
Here is a code sample.
EmployeeItem employeeItem = new EmployeeItem
{
Id = 8,
GroupID = 18,
SiteID = 5565
};
StorageFolder appFolder = Windows.Storage.ApplicationData.Current.LocalFolder;
string path = #"GetAllEmp.txt";
//get data
string rawJson = File.ReadAllText(path, Encoding.UTF8);
ObservableCollection<EmployeeItem> Employee = new ObservableCollection<EmployeeItem>();
var jsonData = JsonConvert.SerializeObject(rawJson);
List<EmployeeItem> emp = JsonConvert.DeserializeObject<List<EmployeeItem>>(rawJson);
emp.Add(employeeItem);
StorageFile sampleFile = await appFolder.GetFileAsync(path);
await Windows.Storage.FileIO.WriteTextAsync(sampleFile, JsonConvert.SerializeObject(emp));
Here I have code which copies the document to specific loacation for that I have code which creates fake file as below.
private static Mock<IFormFile> GetMockFormFile()
{
var fileMock = new Mock<IFormFile>();
var fileStreamProviderMock = new Mock<IFileStreamProvider>();
var contentTypes = "text/plain";
var content = "Fake File";
var fileName = "Test.txt";
var disk = new MemoryStream();
var writers = new StreamWriter(disk);
writers.Write(content);
writers.Flush();
disk.Position = 0;
fileMock.Setup(_ => _.OpenReadStream()).Returns(disk);
fileMock.Setup(_ => _.FileName).Returns(fileName);
fileMock.Setup(_ => _.Length).Returns(disk.Length);
fileMock.Setup(_ => _.ContentType).Returns(contentTypes);
fileMock.Setup(_ => _.ContentDisposition).Returns($"form-data;name='file';filename ='{fileName}'");
fileStreamProviderMock.Setup(_ => _.Create(It.IsAny<string>())).Returns(disk);
fileStreamProviderMock.Setup(_ => _.Open(It.IsAny<string>())).Returns(disk);
fileMock.Setup(_ => _.CopyToAsync(disk, It.IsAny<CancellationToken>())).Returns(Task.CompletedTask);
fileMock.Verify();
return fileMock;
}
I am passing this fake file to controller like
[Fact]
public async sample()
{
var formFile = GetMockFormFile();
var file = formFile.Object;
var SavedFile = new AttachmentMetadata
{
Id = 4231,
DocumentName = file.FileName,
DocumentType = file.ContentType,
CreatedDateTime = dateTime,
ModifiedDateTime = dateTime
};
var viewModel = new AttachmentViewModel()
{
File = new List<IFormFile> { file }
};
await controller.AddAttachment(viewModel);
fileStreamProviderMock.Verify(_ => _.Create(#"C:\TestFolder\Test.txt"));
formFile.Verify(_ => _.CopyToAsync(It.IsAny<Stream>(), It.IsAny<CancellationToken>()));
}
and inside my controller i have some code as like
foreach (var file in model.File)
{
using (var ms = new MemoryStream())
{
file.CopyTo(ms);
fileBytes = ms.ToArray();
}
----
----
}
which should return the fileBytes something except 0, currently I am getting fileBytes as 0 which cause issues in further code execution.
So is there any other way where I can mock this fake bytes size?
I need to add one more node to Json string.
Following is the code from where I am reading the data.
var url = "https://xyz_12232_abc/0908978978979.json";
var sys = new WebClient();
var content = sys.DownloadString(url);
I received following output from above code:
{
"2312312312313":
{
"emailId":"abc#gmail.com",
"model":"XYZ001",
"phone":"+654784512547",
"userName":"User1"
},
"23456464512313":
{
"emailId":"abcd#gmail.com",
"model":"XYZ002",
"phone":"+98745114474",
"userName":"User2"
},
"45114512312313":
{
"emailId":"abcde#gmail.com",
"model":"XYZ3",
"phone":"+214784558741",
"userName":"User3"
}
}
But, I want this output like below:
{
"Records": [
{
"UID":"2312312312313":,
"emailId":"abc#gmail.com",
"model":"XYZ001",
"phone":"+654784512547",
"userName":"User1"
},
{
"UID":"23456464512313":,
"emailId":"abcd#gmail.com",
"model":"XYZ002",
"phone":"+98745114474",
"userName":"User2"
},
{
"UID":"45114512312313":,
"emailId":"abcde#gmail.com",
"model":"XYZ3",
"phone":"+214784558741",
"userName":"User3"
}
]
}
Now, how can it be achieved ?
You can use Json.NET to massage the data into your desired output:
var jsonStr = #"..."; // your JSON here
var obj = JsonConvert.DeserializeObject<Dictionary<string, JObject>>(jsonStr);
var formattedObj = new
{
Records = obj.Select(x =>
{
x.Value.AddFirst(new JProperty("UID", x.Key));
return x.Value;
})
};
// serialize back to JSON
var formattedJson = JsonConvert.SerializeObject(formattedObj);
I am using HTTPCLient to call RestFul service. My problem when parsing DateTime.
Because in my class I have DateTime Property. Which in Json it is type long. Json key is: exp
{
"resultCodes": "OK",
"description": "OK",
"pans": {
"maskedPan": [
{
"id": "4b533683-bla-bla-3517",
"pan": "67*********98",
"exp": 1446321600000,
"isDefault": true
},
{
"id": "a3093f00-zurna-01e18a8d4d72",
"pan": "57*********96",
"exp": 1554058800000,
"isDefault": false
}
]
}
}
In documentation i read that
To minimize memory usage and the number of objects allocated Json.NET supports serializing and deserializing directly to a stream.
So =>
WAY 1 (Reading via GetStringAsync). In documentation has written that use StreamReader instead.
return Task.Factory.StartNew(() =>
{
var client = new HttpClient(_handler);
var url = String.Format(_baseUrl + #"list/{0}", sessionId);
BillsList result;
var rrrrr = client.GetStringAsync(url).Result;
result = JsonConvert.DeserializeObject<BillsList>(rrrrr,
new MyDateTimeConverter());
return result;
}, cancellationToken);
WAY 2(Good way. I read via StreamReader. Bu in line var rTS = sr.ReadToEnd(); it creates new string. It is not good. Because i have used GetStreamAsync to avoid of creating string variable.)
return Task.Factory.StartNew(() =>
{
var client = new HttpClient(_handler);
var url = String.Format(_baseUrl + #"list/{0}", sessionId);
BillsList result;
using (var s = client.GetStreamAsync(url).Result)
using (var sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
var rTS = sr.ReadToEnd();
result = JsonConvert.DeserializeObject<BillsList>(rTS,
new MyDateTimeConverter());
}
return result;
}, cancellationToken);
WAY 3(The best. But it gives exception if property is DateTime in my class. )
return Task.Factory.StartNew(() =>
{
var client = new HttpClient(_handler);
var url = String.Format(_baseUrl + #"list/{0}", sessionId);
BillsList result;
using (var s = client.GetStreamAsync(url).Result)
using (var sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
var serializer = new JsonSerializer();
result = serializer.Deserialize<BillsList>(reader);
}
return result;
}, cancellationToken);
So my question. I want to continue with 3-rd way. But have there any way to set some handler as MyDateTimeConverter for JsonSerializer to convert it automatically?
You can set up default JsonSerializerSettings when your app is initialized:
// This needs to be done only once, so put it in an appropriate static initializer.
JsonConvert.DefaultSettings = () => new JsonSerializerSettings
{
Converters = new List<JsonConverter> { new MyDateTimeConverter() }
};
Then later you can use JsonSerializer.CreateDefault
JsonSerializer serializer = JsonSerializer.CreateDefault();
result = serializer.Deserialize<BillsList>(reader);
You can add your MyDateTimeConverter to the Converters collection on the JsonSerializer; that should allow you to use your third approach without getting errors.
var serializer = new JsonSerializer();
serializer.Converters.Add(new MyDateTimeConverter());
result = serializer.Deserialize<BillsList>(reader);
I do have this code it fund the good value, but it doesn't save the modification. What can I do ?
using (StreamReader r = new StreamReader("C:/Files/generated.json"))
{
string json = r.ReadToEnd();
var result = JsonConvert.DeserializeObject<List<Form>>(json);
foreach (var item in result)
{
if (item.id == FormtoSave.id)
{
item.Title = FormtoSave.Title;
item.body = FormtoSave.body;
}
}
}
After modification in Item title and body you have again serialize object in json and store Json string in file.
TextWriter writer = new StreamWriter("c:\\fileName..json");
writer.WriteLine("Serialized Json string ");
writer.Flush();
writer.Close();
Try this to convert your modified object back to a json:
string jsonOutput= JsonConvert.SerializeObject(result);
Edit:
In order to save the string to a file use this:
string path = #"c:\output.json";
File.WriteAllText(path, jsonOutput);
You need to save the changes back to the file:
string resultJson = String.Empty;
using (StreamReader r = new StreamReader("C:/Files/generated.json"))
{
string json = r.ReadToEnd();
var result = JsonConvert.DeserializeObject<List<Form>>(json);
foreach (var item in result)
{
if (item.id == FormtoSave.id)
{
item.Title = FormtoSave.Title;
item.body = FormtoSave.body;
}
}
resultJson = JsonConvert.SerializeObject(result);
}
File.WriteAllText("C:/Files/generated.json", resultJson);
I did the writing outside the using so the file is not still locked by the StreamReader.
Or not using a StreamReader:
string path = "C:/Files/generated.json";
var result = JsonConvert.DeserializeObject<List<Form>>(File.ReadAllText(path));
foreach (var item in result)
{
if (item.id == FormtoSave.id)
{
item.Title = FormtoSave.Title;
item.body = FormtoSave.body;
}
}
File.WriteAllText(path, JsonConvert.SerializeObject(result));
Below example will help you
List<data> _data = new List<data>();
_data.Add(new data()
{
Id = 1,
SSN = 2,
Message = "A Message"
});
string json = JsonConvert.SerializeObject(_data.ToArray());
//write string to file
System.IO.File.WriteAllText (#"D:\path.txt", json);