How can I handle mismatch between schema and model after transformation? - c#

Exploring ML.Net and I want to predict employee turnover. I have a dataset available, with a mix between numeric and string values.
This is all just purely exploration in my attempt in getting to know ML.net. So my approach was to, simply step by step explore the options, so I really would understand each and every step as good as possible.
Load the data
Prepare the dataset and do a categorical transform on the string features
Display the dataset after applying the transformations
Then split the dataset into a train and test dataset
Train the model with a classification algorithm
Evaluate against the test dataset
Output the feature weights of the model
Do some cool stuff with it
The model is as follows and based on the open source attrition dataset from IBM. https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset
The model:
public class Employee
{
[LoadColumn(0)]
public int Age { get; set; }
[LoadColumn(1)]
//[ColumnName("Label")]
public string Attrition { get; set; }
[LoadColumn(2)]
public string BusinessTravel { get; set; }
[LoadColumn(3)]
public int DailyRate { get; set; }
[LoadColumn(4)]
public string Department { get; set; }
[LoadColumn(5)]
public int DistanceFromHome { get; set; }
[LoadColumn(6)]
public int Education { get; set; }
[LoadColumn(7)]
public string EducationField { get; set; }
[LoadColumn(8)]
public int EmployeeCount { get; set; }
[LoadColumn(9)]
public int EmployeeNumber { get; set; }
[LoadColumn(10)]
public int EnvironmentSatisfaction { get; set; }
[LoadColumn(11)]
public string Gender { get; set; }
[LoadColumn(12)]
public int HourlyRate { get; set; }
[LoadColumn(13)]
public int JobInvolvement { get; set; }
[LoadColumn(14)]
public int JobLevel { get; set; }
[LoadColumn(15)]
public string JobRole { get; set; }
[LoadColumn(16)]
public int JobSatisfaction { get; set; }
[LoadColumn(17)]
public string MaritalStatus { get; set; }
[LoadColumn(18)]
public int MonthlyIncome { get; set; }
[LoadColumn(19)]
public int MonthlyRate { get; set; }
[LoadColumn(20)]
public int NumCompaniesWorked { get; set; }
[LoadColumn(21)]
public string Over18 { get; set; }
[LoadColumn(22)]
public string OverTime { get; set; }
[LoadColumn(23)]
public int PercentSalaryHike { get; set; }
[LoadColumn(24)]
public int PerformanceRating{ get; set; }
[LoadColumn(25)]
public int RelationshipSatisfaction{ get; set; }
[LoadColumn(26)]
public int StandardHours{ get; set; }
[LoadColumn(27)]
public int StockOptionLevel{ get; set; }
[LoadColumn(28)]
public int TotalWorkingYears{ get; set; }
[LoadColumn(29)]
public int TrainingTimesLastYear{ get; set; }
[LoadColumn(30)]
public int WorkLifeBalance{ get; set; }
[LoadColumn(31)]
public int YearsAtCompany{ get; set; }
[LoadColumn(32)]
public int YearsInCurrentRole{ get; set; }
[LoadColumn(33)]
public int YearsSinceLastPromotion{ get; set; }
[LoadColumn(34)]
public int YearsWithCurrManager { get; set; }
}
The string properties are then transformed (as explained here https://learn.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/prepare-data-ml-net#work-with-categorical-data)
var categoricalEstimator = mlContext.Transforms.Categorical.OneHotEncoding("Attrition")
.Append(mlContext.Transforms.Categorical.OneHotEncoding("BusinessTravel"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("EducationField"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("Gender"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("JobRole"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("MaritalStatus"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("Over18"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("OverTime"));
ITransformer categoricalTransformer = categoricalEstimator.Fit(dataView);
IDataView transformedData = categoricalTransformer.Transform(dataView);
Now I want to inspect what has changed (https://learn.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/inspect-intermediate-data-ml-net#convert-idataview-to-ienumerable). The challenge I have now is that after applying a transformation on the string properties, the schema has changed and now contains the expected vectors.
So the following is happening. The Employee model schema does not match the schema from the transformedData object anymore and tries to fit a Vector property into a String property and throws the following error "Can't bind the IDataView column 'Attrition' of type 'Vector' to field or property 'Attrition' of type 'System.String'."
IEnumerable<Employee> employeeDataEnumerable =
mlContext.Data.CreateEnumerable<Employee>(transformedData, reuseRowObject: true);
The CreateEnumerable also has a SchemaDefinition argument, so my first guess was to extract the Schema from the transformedData, and supply that to the CreateEnumerable. However it expects a Microsoft.ML.DataViewSchema and the schema produced by the transform is a Microsoft.ML.Data.SchemaDefinition. So that didn't work either.
I hope someone can advice me on this. Should I do something different?
Full Controller Action:
public ActionResult Turnover()
{
MLContext mlContext = new MLContext();
var _appPath = AppDomain.CurrentDomain.BaseDirectory;
var _dataPath = Path.Combine(_appPath, "Datasets", "WA_Fn-UseC_-HR-Employee-Attrition.csv");
// Load data from file
IDataView dataView = mlContext.Data.LoadFromTextFile<Employee>(_dataPath, hasHeader: true);
// 0. Get the column name of input features.
string[] featureColumnNames =
dataView.Schema
.Select(column => column.Name)
.Where(columnName => columnName != "Label")
.ToArray();
// Define categorical transform estimator
var categoricalEstimator = mlContext.Transforms.Categorical.OneHotEncoding("Attrition")
.Append(mlContext.Transforms.Categorical.OneHotEncoding("BusinessTravel"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("EducationField"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("Gender"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("JobRole"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("MaritalStatus"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("Over18"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("OverTime"));
ITransformer categoricalTransformer = categoricalEstimator.Fit(dataView);
IDataView transformedData = categoricalTransformer.Transform(dataView);
// Inspect (fails because Employee (35 cols) cannot be mapped to new schema (52 cols)
IEnumerable<Employee> employeeDataEnumerable =
mlContext.Data.CreateEnumerable<Employee>(transformedData, reuseRowObject: true, schemaDefinition : transformedData.Schema);
// split the transformed dataset into training and a testing datasets
DataOperationsCatalog.TrainTestData dataSplit = mlContext.Data.TrainTestSplit(transformedData, testFraction: 0.2);
IDataView trainData = dataSplit.TrainSet;
IDataView testData = dataSplit.TestSet;
return View();
}

I ran into this recently and as a quick workaround, I simply created a new class that matches the transformed data schema. For example, you can create EmoloyeeTransformed class with the correct properties (i.e. vector instead of string) and use that as follows:
CreateEnumerable<EmployeeTransformed>
This isnt optimal if you are going to create various transformed schemas, but it works.
Hope that helps.

For debugging purposes you can also call transformedData.Preview() and look at the data and the resulting Schema.

Related

How to update a table of an existing, scaffolded database in ASP.NET Core

I scaffolded my database succesfully, and I tried adding a field to a model
`
public partial class Cotizaciones
{
private static Random rnd = new Random();
public Cotizaciones()
{
DetalleProductoPersonalizados = new HashSet<DetalleProductoPersonalizado>();
}
[Key]
public int Idcotizacion { get; set; }
public DateTime FechaInicio { get; set; }
public DateTime FechaFin { get; set; }
public double PrecioFinal { get; set; }
public string Ubicacion { get; set; } = null!;
public bool Estado { get; set; }
public int? PaqueteFk { get; set; }
[Column(TypeName = "nvarchar(max)")]
public string? NombreCotizacion = GenerateLetter(); //---> new field
private static string GenerateLetter()
{
StringBuilder fileName = new StringBuilder("");
for (int i = 0; i <= rnd.NextInt64(1,35); i++)
{
fileName.Insert(i, Convert.ToChar(rnd.Next(65, 90)));
}
return fileName.ToString();
}
[NotMapped]
[DisplayName("Subir comprobante de pago")]
public IFormFile ImageFile { get; set; }
public virtual Paquete? PaqueteFkNavigation { get; set; }
public virtual ICollection<DetalleProductoPersonalizado> DetalleProductoPersonalizados { get; set; }
}
`
However applying migrations said no changes were made, making a new migration and trying to apply it throws me this message
There is already an object named 'AspNetRoles' in the database.
You are mixing Model stuff, with business logic, and EF will not allow this. You would need to take the "GenerateLetter" piece and move it to a different process, and make your new addition a true property.
You could possibly use a [Backing Fields][1] implementation to try and get this working, but it will not do what I think you might think it would do.
You will most likely have to re-think how that GenerateLetter method is called if you want to persist the value to the database. You could possibly make it a [computed column][1], but you wouldn't have access to Random etc. there.

Filtering on the Collection Navigation property

I would like to filter my 'TranslationSet' entities, based on their 'Translations' Collection Navigation Property.
E.g.
If a 'Translation' has a 'LanguageId' of 5 (Italian), then the 'TranslationSet' that contains this 'Translation' should be removed from the result.
Here are my Entity classes:
public class Language
{
public int LanguageId { get; set; }
public string NationalLanguage { get; set; }
//Make table multi tenanted.
public int TenantId { get; set; }
public ApplicationTenant Tenant { get; set; }
public List<Translation> Translation { get; set; } = new List<Translation>();
}
public class Translation
{
public int TranslationId { get; set; }
public string TranslatedText { get; set; }
public int LanguageId { get; set; }
public Language Language { get; set; }
//Make table multi tenanted.
public int TenantId { get; set; }
public ApplicationTenant Tenant { get; set; }
public int TranslationSetId { get; set; }
public TranslationSet TranslationSet {get; set;}
}
public class TranslationSet
{
public int TranslationSetId { get; set; }
public int TenantId { get; set; }
public ApplicationTenant Tenant { get; set; }
public IEnumerable<Translation> Translations { get; set; }
}
Here is my attempt
From the image you can see that the query fails because a Translation exists with LanguageId of 5.
I have tried many many attempts to resolve this but I can't even get close the LINQ which returns my query correctly.
Please let me know if any further clarification is needed and thanks in advance to anybody who offers help.
My rule of the thumb that nearly always work is: start by querying the entities you want. That will prevent duplicates as you see in your query result. Then add predicates to filter the entities, using navigation properties. That will be:
var sets = TranslationSets // start the query here
.Where(ts => ts.Translations.All(t => t.LanguageId != 5)); // Filter
Or if you like this better:
var sets = TranslationSets // start the query here
.Where(ts => !ts.Translations.Any(t => t.LanguageId == 5)); // Filter
EF will translate both queries as WHERE NOT EXISTS.

C# Searching List of Arrays for specific value and returning related value

I hope this isn't a foolishly simple question. Im very simply trying to figure out how to manipulate a relatively simple table in SQLite through C#.
Im looking to take a parameter and search a List of Arrays for one such array where the parameter matches, and return a related variable within that same array.
For example where an array in the list might be.
Name IATA
Brisbane BNE
The sqlbind:
public static List<Airport> LoadAirports()
{
using (IDbConnection cnn = new SQLiteConnection(LoadConnectionString()))
{
var output = cnn.Query<Airport>("select * from Airport", new DynamicParameters());
return output.ToList();
}
}
The Class:
class Airport
{
int Id { get; set; }
string Name { get; set; }
string LocationName { get; set; }
string IATA { get; set; }
string PortType { get; set; }
string PortOwner { get; set; }
string MotherPort { get; set; }
bool Active { get; set; }
bool IsApplyMeetAndGreet { get; set; }
decimal MeetAndGreet { get; set; }
}
The main Program:
List<Airport> Airports = new List<Airport>();
public FreightCalculator()
{
LoadAirportsList();
string OriginName = OriginInput.Value;
var OriginAirport = Airports.Where(s => s.Name == OriginName);
}
private void LoadAirportsList()
{
Airports = SqliteDataAccess.LoadAirports();
}
Ive tried various combinations of Where, Equals, For each indexing etc. Always getting an error of some kind.
The Error with the above Airports.Where is that the s.Name is inaccessible due to its protection level.
If I do:
var OriginAirport = Airports.Where(Name => Name == OriginName);
I get an error where the operand == cannot be used with Airport and String (Though Name is a string in Airport.)
Im either missing something simple or making this more complicated than it needs to be. Once I find the matching Airport, I need to return the IATA code.
Which I envisage looking like this:
var OriginIATA = OriginAirport.IATA;
Im tired and feeling dumb. Please help :(
Since you declared all members of the Airport class as properties I assume you wanted to expose them publicly.
The error you get is because they are private members and can't be accessed outside the class.
Change "Airport" class to:
class Airport
{
public int Id { get; set; }
public string Name { get; set; }
public string LocationName { get; set; }
public string IATA { get; set; }
public string PortType { get; set; }
public string PortOwner { get; set; }
public string MotherPort { get; set; }
public bool Active { get; set; }
public bool IsApplyMeetAndGreet { get; set; }
public decimal MeetAndGreet { get; set; }
}

jQuery datatable in MVC (server-side)

https://datatables.net/usage/server-side
On the page above, there are parameters that you need to receive to make server-side datatable work.
I have a helper class
public class TableParameter
{
public string sEcho { get; set; }
public int iDisplayStart { get; set; }
public int iDisplayLength { get; set; }
public int iSortingCols { get; set; }
}
But in order to sort columns I need to receive
string sSortDir_(int)
How do I do that? I know (int) represents column ID that needs to be sorted, but I just can't catch it in my controller.
The datatable will post one or more sSortDir_x parameters to your controller, depending on how many columns are sorted on simultaneously in the table.
The specific columns that the table is sorted by are sent in the iSortCol_ parameters (again, one or more).
public class TableParameter
{
public string sEcho { get; set; }
public int iDisplayStart { get; set; }
public int iDisplayLength { get; set; }
public int iSortingCols { get; set; }
public int iSortCol_0 { get; set; } // the first (and usually only) column to be sorted by
public string sSortDir_0 { get; set; } // the direction of the first column sort (asc/desc)
public int iSortCol_1 { get; set; } // the second column to be sorted by
public string sSortDir_1 { get; set; } // the direction of the second column sort
// etc
}
For receiveing a column name in action, that is used for one-column sorting:
public ActionResult SomeMethod(FormCollection coll)
{
var sortingColumnNumber = Convert.ToInt32(coll["iSortCol_0"]);
var sortingColumnName = coll[string.Format("mDataProp_{0}", sortingColumnNumber)];
var propertyInfo = typeof(SomeObject).GetProperty(sortingColumnName);
//..get List<SomeObject> sortedObjects
sortedObjects = sortedObjects.OrderBy(x => propertyInfo.GetValue(x, null)).ToList();
//...
}

How should I handle lookups in my ViewModel?

My database table for buildings stores the building type as a code. In a separate lookup table the description for that code is stored.
How should I design my ViewModel and where will I need to make the call to get the associated description value?
I sort of can see one option. I want to know if there is a better option.
BuildingViewModel
{
public string BuildingTypeCode { get;set;}
...other properties
}
Then in my view
code...
<p>#MyService.GetDescription(Model.BuildingTypeCode)</p>
...code
Am I incorrect in the way I am thinking? if I do the above I create a dependency in my View to the service?
Update 1
Working through some of the solutions offered. I seem to run into another issue. I can't access the constructor of each building directly...
public ViewResult Show(string ParcelId)
{
var result = _service.GetProperty(ParcelId);
var AltOwners = _service.GetAltOwners(ParcelId);
var Buildings = _service.GetBuildings(ParcelId);
ParcelDetailViewModel ViewModel = new ParcelDetailViewModel();
ViewModel.AltOwnership = new List<OwnerShipViewModel>();
ViewModel.Buildings = new List<BuildingViewModel>();
AutoMapper.Mapper.Map(result, ViewModel);
AutoMapper.Mapper.Map<IEnumerable<AltOwnership>, IEnumerable<OwnerShipViewModel>>(AltOwners,ViewModel.AltOwnership);
AutoMapper.Mapper.Map<IEnumerable<Building>, IEnumerable<BuildingViewModel>>(Buildings, ViewModel.Buildings);
ViewModel.Pool = _service.HasPool(ParcelId);
ViewModel.Homestead = _service.IsHomestead(ParcelId);
return View(ViewModel);
}
public class ParcelDetailViewModel
{
public IEnumerable<OwnerShipViewModel> AltOwnership { get; set; }
//public IEnumerable<ValueViewModel> Values { get; set; }
public IEnumerable<BuildingViewModel> Buildings { get; set; }
//public IEnumerable<TransferViewModel> Transfers { get; set; }
//public IEnumerable<SiteAddressViewModel> SiteAddresses { get; set; }
public string ParcelID { get; set; }
//public string ParcelDescription { get; set; }
//public int LandArea { get; set; }
//public string Incorporation { get; set; }
//public string SubdivisionCode {get;set;}
public string UseCode { get; set; }
//public string SecTwpRge { get; set; }
//public string Census { get; set; }
//public string Zoning { get; set; }
public Boolean Homestead {get;set;}
//public int TotalBuildingArea { get; set; }
//public int TotalLivingArea { get; set; }
//public int LivingUnits { get; set; }
//public int Beds { get; set; }
//public decimal Baths { get; set; }
public short Pool { get; set; }
//public int YearBuilt { get; set; }
}
My understanding is that the view model is meant for display ready data. I think the real problem here is putting model dependent logic into the view.
You can do your service lookup but keep that code in the controller. The view model should be considered display ready (save for some formatting).
class BuildingViewModel
{
public string BuildingTypeCode { get;set;}
...other properties
}
and then do the lookup before you render:
public ActionResult Building()
{
var typeCode = // get from original source?
var model = new BuildingViewModel
{
BuildingTypeCode = MyService.GetDescription(typeCode)
};
return View("Building", model);
}
Having come from a long line of JSP custom tags I dread having any code hidden in the view layout. IMO, that layer should be as dumb as possible.
I would recommend having a helper that does that, or a DisplayTemplate
public class ViewHelpers
{
public static string GetDescription(string code)
{
MyService.GetDescription(Model.BuildingTypeCode);
}
}
or
#ModelType string
#Html.DisplayFor("",MyService.GetDescription(Model.BuildingTypeCode));
More info on templates: http://www.headcrash.us/blog/2011/09/custom-display-and-editor-templates-with-asp-net-mvc-3-razor/
Both of these approaches introduce a dependency on your service but you can test/change them in one single place, instead of the whole application (plus the usage looks cleaner).

Categories