Taglib Performance issue - c#

I want to read bulk audio file's tags faster. currently i am able to store 5000 audio files info to struct list in about 7 seconds.
issue is when i select folder having 20,000 or 40,000 files the app keeps on running and doesn't notify that the process is done. whereas when it reads 5,000 files it shows the message box prompting "Done loading files 5000" in 7 seconds.
Here is my code:
public struct SongInfoStruct
{
public string Key;
public string Value;
public string Artist;
public double Duration;
public string Comments;
public string Album;
public string Missing;
};
public async Task<SongInfoStruct> GetSongInfo(string url)
{
var songinfo = (dynamic)null;
var tagFile = TagLib.File.Create(url);
var songName = (dynamic)null;
var artist = (dynamic)null;
var album = (dynamic)null;
var comments = (dynamic)null;
var duration = (dynamic)null;
await Task.Run(() =>
{
songName = tagFile.Tag.Title;
artist = tagFile.Tag.FirstPerformer;
album = tagFile.Tag.Album;
comments = tagFile.Tag.Comment;
duration = tagFile.Properties.Duration.TotalSeconds;
});
return songinfo = new SongInfoStruct
{
Key = url,
Value = songName,
Artist = artist,
Duration = duration,
Comments = comments,
Album = album,
Missing = " "
};
}
public async Task<List<SongInfoStruct>> ReadPathFromSource(string Source)
{
var files = Directory.EnumerateFiles(Source, "*",
SearchOption.AllDirectories).Where(s => s.EndsWith(".mp3") ||
s.EndsWith(".m4a"));
int length = files.Count();
var listpaths = new List<SongInfoStruct>(length);
listpaths.Clear();
foreach (string PathsClickSong_temp in files)
{
var item = await GetSongInfo(PathsClickSong_temp);
await Task.Run(() =>
{
listpaths.Add(item);
});
}
MessageBox.Show("Done loading files "+ listpaths.Count.ToString());
return listpaths;
}

As a good practice, always try to set ConfigureAwait to false, unless you want to use same caller context.
var item = await GetSongInfo(PathsClickSong_temp).ConfigureAwait(false);
Here, you can try it using Task.WhenAll instead, which gets multiple data concurrently.
Task.Run is good to use when there is CPU intense work, for detail please have a look at When correctly use Task.Run and when just async-await
var listpaths = new List<Task<SongInfoStruct>>(length);
listpaths.Clear();
foreach (string PathsClickSong_temp in files)
{
var item = GetSongInfo(PathsClickSong_temp);
listpaths.Add(item);
}
await Task.WhenAll(listpaths);

Related

How to have an AWS Lambda/Rekognition Function return an array of object keys

This feels like a simple question and I feel like I am overthinking it. I am doing an AWS project that will compare face(s) on an image to a database (s3bucket) of other faces. So far, I have a lambda function for the comparefacerequest, a class library which invokes the function, and an UWP that inputs the image file and outputs a result. It has worked so far being based on boolean (true or false) functions, but now I want it to instead return what face(s) are recognized via an array. I struggling at implementing this.
Below is my lambda function. I have adjusted the task to be an Array instead of a bool and changed the return to be an array. At the bottom, I have created a global variable class with a testing array so I could attempt to reference the array elsewhere.
public class Function
{
//Function
public async Task<Array> FunctionHandler(string input, ILambdaContext context)
{
//number of matched faces
int matched = 0;
//Client setup
var rekognitionclient = new AmazonRekognitionClient();
var s3client = new AmazonS3Client();
//Create list of target images
ListObjectsRequest list = new ListObjectsRequest
{
BucketName = "bucket2"
};
ListObjectsResponse listre = await s3client.ListObjectsAsync(list);
//loop of list
foreach (Amazon.S3.Model.S3Object obj in listre.S3Objects)
{
//face request with input and obj.key images
var comparefacesrequest = new CompareFacesRequest
{
SourceImage = new Image
{
S3Object = new S3Objects
{
Bucket = "bucket1",
Name = input
}
},
TargetImage = new Image
{
S3Object = new S3Objects
{
Bucket = "bucket2",
Name = obj.Key
}
},
};
//compare with confidence of 95 (subject to change) to current target image
var detectresponse = await rekognitionclient.CompareFacesAsync(comparefacesrequest);
detectresponse.FaceMatches.ForEach(match =>
{
ComparedFace face = match.Face;
if (match.Similarity > 95)
{
//if face detected, raise matched
matched++;
for(int i = 0; i < Globaltest.testingarray.Length; i++)
{
if (Globaltest.testingarray[i] == "test")
{
Globaltest.testingarray[i] = obj.Key;
}
}
}
});
}
//Return true or false depending on if it is matched
if (matched > 0)
{
return Globaltest.testingarray;
}
return Globaltest.testingarray;
}
}
public static class Globaltest
{
public static string[] testingarray = { "test", "test", "test" };
}
Next, is my invoke request in my class library. It has so far been based on the lambda outputting a boolean result, but I thought, "hey, it is parsing the result, it should be fine, right"? I do convert the result to a string, as there is no GetArray, from what I know.
public async Task<bool> IsFace(string filePath, string fileName)
{
await UploadS3(filePath, fileName);
AmazonLambdaClient client = new AmazonLambdaClient(accessKey, secretKey, Amazon.RegionEndpoint.USWest2);
InvokeRequest ir = new InvokeRequest();
ir.InvocationType = InvocationType.RequestResponse;
ir.FunctionName = "ImageTesting";
ir.Payload = "\"" + fileName + "\"";
var result = await client.InvokeAsync(ir);
var strResponse = Encoding.ASCII.GetString(result.Payload.ToArray());
if (bool.TryParse(strResponse, out bool result2))
{
return result2;
}
return false;
}
Finally, here is the section of my UWP where I perform the function. I am referencing the lambda client via "using Lambdaclienttest" (name of lamda project, and this is its only instance I use the reference though). When I run my project, I do still get a face detected when it should, but the Globaltest.testingarray[0] is still equal to "test".
var Facedetector = new FaceDetector(Credentials.accesskey, Credentials.secretkey);
try
{
var result = await Facedetector.IsFace(filepath, filename);
if (result)
{
textBox1.Text = "There is a face detected";
textBox2.Text = Globaltest.testingarray[0];
}
else
{
textBox1.Text = "Try Again";
}
}
catch
{
textBox1.Text = "Please use a photo";
}
Does anyone have any suggestions?

c# .netcore List all files under amazon s3 folder

Using C# and amazon .Net core, able to list all the files with in a amazon S3 folder as below:
public async Task<string> GetMenuUrl(entities.Restaurant restaurant)
{
AmazonS3Client s3Client = new AmazonS3Client(_appSettings.AWSPublicKey, _appSettings.AWSPrivateKey, Amazon.RegionEndpoint.APSoutheast2);
string imagePath;
string restaurantName = trimSpecialCharacters(restaurant.Name);
int restaurantId = restaurant.RestaurantId;
ListObjectsRequest listRequest = new ListObjectsRequest();
ListObjectsResponse listResponse;
imagePath = $"Business_menu/{restaurantId}/";
listRequest.BucketName = _appSettings.AWSS3BucketName;
listRequest.Prefix = imagePath;
do
{
listResponse = await s3Client.ListObjectsAsync(listRequest);
} while (listResponse.IsTruncated);
var files = listResponse.S3Objects.Select(x => x.Key);
var arquivos = files.Select(x => Path.GetFileName(x)).ToList();
return arquivos.ToString();
}
Currently arquivos returns a list containing both the images (image1.jpg, image2.jpg) which is as expected and then I return it as a string.
But when I go to call this method from another function.
public async Task<VenueMenuResponse> GetVenueMenuUrl(int restaurantId)
{
var restaurant = await _context.Restaurant.Where(w => w.RestaurantId == restaurantId).FirstOrDefaultAsync();
var result = await _storyService.GetMenuUrl(restaurant);
var response = new MenuResponse() //just contains string variable called MenuUrl
{
MenuUrl = result
};
return response;
}
It returns this:
{
"menuUrl": "System.Collections.Generic.List`1[System.String]"
}
When I want it to return
{
"menuUrl": "Image1.jpg"
},
{
"menuUrl": "Image2.jpg"
}
You need to iterate thought results and return list of results.
public async Task<IEnumerable<VenueMenuResponse>> GetVenueMenuUrl(int restaurantId)
{
var restaurant = await _context.Restaurant.Where(w => w.RestaurantId == restaurantId).FirstOrDefaultAsync();
var result = await _storyService.GetMenuUrl(restaurant);
var response = result.Select(e => new MenuResponse() //just contains string variable called MenuUrl
{
MenuUrl = e
};
return response;
}
var result = string.Join(", ", fileName);
The default ToString() implementation of List simply prints the name of the type.
This magical line managed to solve things for me. The list was originally outputting just the object instead of the actual content of the list.

API is mixing up data from different devices

I have an API that has devices firing data to it at the same time or within a few milliseconds. What I am finding is that the data is getting mixed up. The data is sent every five minutes (on the clock 05, 10, 15 etc.) I have an execution filter that traps the URL data coming in so I always have a real source, then it goes to the endpoint and then onto processing. For example, there will a be random five minute period missing. When I debug step by step with the missing URL from the execution filter it works fine. By that I mean I take the URL and debug, then it inserts.
In summary, I have device id 1 and device id 2.I will get missing intervals even though, I can see the data has hit the execution filter.
I am assuming that the API is not handling these as separate transactions, but somehow mixing them up together, hence the data missing and the serial numbers appearing in the wrong place, such that data from id 1 is appearing in id 2 vice versa etc.
API End Point:
public class SomeController : ApiController
{
[HttpGet]
[ExecutionFilter]
public async Task<HttpResponseMessage> Get([FromUri] FixedDataModel fdm)
{
var reply = new HttpResponseMessage();
string url = HttpUtility.UrlDecode(HttpContext.Current.Request.QueryString.ToString());
if (url.Contains("timestamp"))
{
reply = TimeSyncValidation.TimeSync;
return reply;
}
else if (!url.Contains("timestamp"))
{
reply = await Task.Run(() => DeviceClass.DeviceApiAsync(fdm, url));
}
return reply;
}
}
Processing class:
namespace API.Services
{
public class DeviceClass
{
private static string serialNumber;
private static byte chk;
private static string channelName, channelReadingNumber, channelValue, queryString, readingDate;
private static int colonPosition, chanCountFrom, equalsPosition;
private static bool checkSumCorrect;
public static HttpResponseMessage DeviceApiAsync(FixedDataModel fdm, string urlQqueryString)
{
Guid guid = Guid.NewGuid();
//ExecutionTrackerHandler.Guid = guid;
//Remove question mark
var q = urlQqueryString;
queryString = q.Substring(0);
var items = HttpUtility.ParseQueryString(queryString);
serialNumber = items["se"];
//Store raw uri for fault finding
var rawUri = new List<RawUriModel>
{
new RawUriModel
{
UniqueId = guid,
RawUri = q,
TimeStamp = DateTime.Now
}
};
//Checksum validation
chk = Convert.ToByte(fdm.chk);
checkSumCorrect = CheckSumValidator.XorCheckSum(queryString, chk);
if (!checkSumCorrect)
{
return ValidationResponseMessage.ResponseHeaders("Checksum");
}
//Create list of items that exist in URL
var urldata = new UrlDataList
{
UrlData = queryString.Split('&').ToList(),
};
var data = new List<UriDataModel>();
//Split the URL string into its parts
foreach (var item in urldata.UrlData)
{
colonPosition = item.IndexOf(":");
chanCountFrom = colonPosition + 1;
equalsPosition = item.LastIndexOf("=");
if (colonPosition == -1)
{
channelName = item.Substring(0, equalsPosition);
channelReadingNumber = "";
channelValue = item.Substring(item.LastIndexOf("=") + 1);
}
else
{
channelName = item.Substring(0, colonPosition);
channelReadingNumber = item.Substring(chanCountFrom, equalsPosition - chanCountFrom);
channelValue = item.Substring(item.LastIndexOf("=") + 1);
if (channelName == "atime" || channelName == "adate")
{
readingDate = DateValidator.CreateDate(channelValue);
}
};
bool nullFlag = false;
if (channelValue == null)
nullFlag = true;
bool missingFlag = false;
if (channelValue == "x") {
missingFlag = true;
channelValue = "0";
}
//Add data to model ready for DB insert.
data.Add(new UriDataModel
{
uid = guid,
SerialNumber = serialNumber,
ChannelName = channelName,
ChannelReadingNumber = channelReadingNumber,
ChannelValue = channelValue.Replace(",", "."),
ReadingDate = readingDate,
TimeStamp = DateTime.Now.ToString("yyyy-MM-dd HH:mm"),
Processed = false,
NullFlag = nullFlag,
MissingFlag = missingFlag
});
};
//Validate dates
var allDates = (from x in data where x.ChannelName.Contains("atime") || x.ChannelName.Contains("adate") select x.ChannelValue).ToList();
bool dateValidation = DateValidator.IsValid(allDates);
if (!dateValidation)
{
return ValidationResponseMessage.ResponseHeaders("Date");
};
//Validate values
var channels = Enum.GetNames(typeof(Channels)).ToList();
List<string> allChannelValues = data.Where(d => channels.Contains(d.ChannelName)).Select(d => d.ChannelValue).ToList();
bool valueValidation = ValueValidator.IsValid(allChannelValues);
if (!valueValidation)
{
return ValidationResponseMessage.ResponseHeaders("Values");
};
//Insert live data
var insertData = DataInsert<UriDataModel>.InsertData(data, "Staging.UriData");
if (!insertData)
{
return ValidationResponseMessage.ResponseHeaders("Sql");
}
var content = "\r\nSUCCESS\r\n";
var reply = new HttpResponseMessage(System.Net.HttpStatusCode.OK)
{
Content = new StringContent(content)
};
return reply;
}
}
}
TIA
You are using global variables and static method to process your data.
Change your method to non-static.
Each DeviceClass worker must update only its own isolated data then push that off back to controller.

Append (n) to file names (not using system.io.file) to ensure unique names

I have a class that holds the name of a file, and the data of a file:
public class FileMeta
{
public string FileName { get; set; }
public byte[] FileData { get; set; }
}
I have a method that populates a collection of this class through an async download operation (Files are not coming from a local file system):
async Task<List<FileMeta>> ReturnFileData(IEnumerable<string> urls)
{
using (var client = new HttpClient())
{
var results = await Task.WhenAll(urls.Select(async url => new
{
FileName = Path.GetFileName(url),
FileData = await client.GetByteArrayAsync(url),
}));
return results.Select(result =>
new FileMeta
{
FileName = result.FileName,
FileData = result.FileData
}).ToList();
}
}
I am going to feed this List<FileMeta> into a ZipFile creator, and the ZipFile like all File containers needs unique file names.
Readability is important, and I would like to be able to do the following:
file.txt => file.txt
file.txt => file(1).txt
file.txt => file(2).txt
There are a number of examples on how to do this within the file system, but not with a simple object collection. (Using System.IO.File.Exists for example)
What's the best way to loop through this collection of objects and return a unique set of file names?
How far I've gotten
private List<FileMeta> EnsureUniqueFileNames(IEnumerable<FileMeta> fileMetas)
{
var returnList = new List<FileMeta>();
foreach (var file in fileMetas)
{
while (DoesFileNameExist(file.FileName, returnList))
{
//Append (n) in sequence until match is not found?
}
}
return returnList;
}
private bool DoesFileNameExist(string fileName, IEnumerable<FileMeta> fileMeta)
{
var fileNames = fileMeta.Select(file => file.FileName).ToList();
return fileNames.Contains(fileName);
}
You can try the following to increment the filenames:
private List<FileMeta> EnsureUniqueFileNames(IEnumerable<FileMeta> fileMetas)
{
var returnedList = new List<FileMeta>();
foreach (var file in fileMetas)
{
int count = 0;
string originalFileName = file.FileName;
while (returnedList.Any(fileMeta => fileMeta.FileName.Equals(file.FileName,
StringComparison.OrdinalIgnoreCase))
{
string fileNameOnly = Path.GetFileNameWithoutExtension(originalFileName);
string extension = Path.GetExtension(file.FileName);
file.FileName = string.Format("{0}({1}){2}", fileNameOnly, count, extension);
count++;
}
returnList.Add(file);
}
return returnList;
}
As a side note, in your ReturnFileData, you're generating two lists, one of anonymous type and one of your actual FileMeta type. You can reduce the creation of the intermediate list. Actually, you don't need to await inside the method at all:
private Task<FileMeta[]> ReturnFileDataAsync(IEnumerable<string> urls)
{
var client = new HttpClient();
return Task.WhenAll(urls.Select(async url => new FileMeta
{
FileName = Path.GetFileName(url),
FileData = await client.GetByteArrayAsync(url),
}));
}
I made the return type a FileMeta[] instead of a List<FileMeta>, as it is a fixed sized returning anyway, and reduces the need to call ToList on the returned array. I also added the Async postfix, to follow the TAP guidelines.

Creating Large List<T>

I have the following code:
var lstMusicInfo = new List<MediaFile>();
var LocalMusic = Directory.EnumerateFiles(AppSettings.Default.ComputerMusicFolder, "*.*", SearchOption.AllDirectories).AsParallel().ToList<string>();
LocalMusic = (from a in LocalMusic.AsParallel()
where a.EndsWith(".mp3") || a.EndsWith(".wma")
select a).ToList<string>();
var DeviceMusic = adb.SyncMedia(this.dev, AppSettings.Default.ComputerMusicFolder, 1);
Parallel.ForEach(LocalMusic, new Action<string>(item =>
{
try
{
UltraID3 mFile = new UltraID3();
FileInfo fInfo;
mFile.Read(item);
fInfo = new FileInfo(item);
bool onDevice = true;
if (DeviceMusic.Contains(item))
{
onDevice = false;
}
// My Problem starts here
lstMusicInfo.Add(new MediaFile()
{
Title = mFile.Title,
Album = mFile.Album,
Year = mFile.Year.ToString(),
ComDirectory = fInfo.Directory.FullName,
FileFullName = fInfo.FullName,
Artist = mFile.Artist,
OnDevice = onDevice,
PTDevice = false
});
//Ends here.
}
catch (Exception) { }
}));
this.Dispatcher.BeginInvoke(new Action(() =>
{
lstViewMusicFiles.ItemsSource = lstMusicInfo;
blkMusicStatus.Text = "";
doneLoading = true;
}));
#endregion
}));
The first part of the code gives me almost instant result containing:
Address on computer of 5780 files.
Get list of all music files on an android device compare it with those 5780 files and return a list of files found on computer but not on device (in my case it returns a list with 5118 items).
The block of code below is my problem, I am filling data into a class, then adding that class into a List<T>, doing it for 5780 times takes 60 seconds, how can I improve it?
// My Problem starts here
lstMusicInfo.Add(new MediaFile
{
Title = mFile.Title,
Album = mFile.Album,
Year = mFile.Year.ToString(),
ComDirectory = fInfo.Directory.FullName,
FileFullName = fInfo.FullName,
Artist = mFile.Artist,
OnDevice = onDevice,
PTDevice = false
});
//Ends here.
Update:
Here is the profiling result and I see it's obvious why it's slowing down >_>
I suppose I should look for a different library that reads music file information.
One way to avoid loading everything once, up front, would be to lazy load the ID3 information as necessary.
You'd construct your MediaFile instances thus...
new MediaFile(filePath)
...and MediaFile would look something like the following.
internal sealed class MediaFile
{
private readonly Lazy<UltraID3> _lazyFile;
public MediaFile(string filePath)
{
_lazyFile = new Lazy<UltraID3>(() =>
{
var file = new UltraID3();
file.Read(filePath);
return file;
});
}
public string Title
{
get { return _lazyFile.Value.Title; }
}
// ...
}
This is possibly less ideal than loading them as fast as you can in the background, if you do something like MediaFiles.OrderBy(x => x.Title).ToList() and nothing has been lazy loaded then you'll have to wait for every file to load.
Loading them in the background would make them available for use immediately after the background loading has finished. But you might have to concern yourself with not accessing some items until the background loading has finished.
You biggest bottleneck is new FileInfo(item), but you don't need FileInfo just to get the Directory and File names. You can use Path.GetDirectoryName and Path.GetFileName, which are must faster since no I/O is involved.
UltraID3 mFile = new UltraID3();
//FileInfo fInfo;
mFile.Read(item);
//fInfo = new FileInfo(item);
bool onDevice = true;
if (DeviceMusic.Contains(item))
{
onDevice = false;
}
// My Problem starts here
lstMusicInfo.Add(new MediaFile()
{
Title = mFile.Title,
Album = mFile.Album,
Year = mFile.Year.ToString(),
ComDirectory = Path.GetDirectoryName(item), // fInfo.Directory.FullName,
FileFullName = Path.GetFileName(item), //fInfo.FullName,
Artist = mFile.Artist,
OnDevice = onDevice,
PTDevice = false
});
//Ends here.

Categories