Fast Way to Replace Names with Ids in Datatable?

Fast Way to Replace Names with Ids in Datatable? - c#

I have a very large CSV file I have to load on a regular basis that contains time series data. Examples of the headers are below:
| SiteName | Company | Date | ResponseTime | Clicks |
This data comes from a service external to the uploader. SiteName and Company are both string fields. In the database these are normalized. There is a Site table and a Company table:
CREATE TABLE [dbo].[Site] (
[Id] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
[Name] NVARCHAR(MAX) NOT NULL
)
CREATE TABLE [dbo].[Company] (
[Id] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
[Name] NVARCHAR(MAX) NOT NULL
)
As well as the data table.
CREATE TABLE [dbo].[SiteStatistics] (
[Id] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
[CompanyId] INT NOT NULL,
[SiteId] INT NOT NULL,
[DataTime] DATETIME NOT NULL,
CONSTRAINT [SiteStatisticsToSite_FK] FOREIGN KEY ([SiteId]) REFERENCES [Site]([Id]),
CONSTRAINT [SiteStatisticsToCompany_FK] FOREIGN KEY ([CompanyId]) REFERENCES [Company]([Id])
)
At around 2 million rows in the CSV file any sort of IO-bound iteration isn't going to work. I need this done in minutes, not days.
My initial thought is that I could pre-load Site and Company into DataTables. I already have the CSV loaded into a datatable in the format that matches the CSV columns. I need to now replace every SiteName with the Id field of Site and every Company with the Id field of Company. What is the quickest, most efficient way to handle this?

If you go with Pre-Loading the Sites and Company's you can get the distinct values using code:
DataView view = new DataView(table);
DataTable distinctCompanyValues = view.ToTable(true, "Company")
DataView view = new DataView(table);
DataTable distinctSiteValues = view.ToTable(true, "Site")
Then load those two DataTables into their SQL Tables using Sql-Bulk-Copy.
Next dump all the data in:
CREATE TABLE [dbo].[SiteStatistics] (
[Id] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
[CompanyId] INT DEFAULT 0,
[SiteId] INT DEFAULT 0,
[Company] NVARCHAR(MAX) NOT NULL,
[Site] NVARCHAR(MAX) NOT NULL,
[DataTime] DATETIME NOT NULL
)
Then do an UPDATE to set the Referential Integrity fields:
UPDATE [SiteStatistics] ss SET
[CompanyId] = (SELECT Id FROM [Company] c Where ss.[Company] = c.Name),
[SiteId] = (SELECT Id FROM [Site] s Where ss.[Site] = s.Name)
Add the Foreign Key constraints:
ALTER TABLE [SiteStatistics] ADD CONSTRAINT [SiteStatisticsToSite_FK] FOREIGN KEY ([SiteId]) REFERENCES [Site]([Id])
ALTER TABLE [SiteStatistics] ADD CONSTRAINT [SiteStatisticsToCompany_FK] FOREIGN KEY ([CompanyId]) REFERENCES [Company]([Id])
Finally delete the Site & Company name fields from SiteStatistics:
ALTER TABLE [SiteStatistics] DROP COLUMN [Company];
ALTER TABLE [SiteStatistics] DROP COLUMN [Site];

Related

INSTEAD OF SQL Server trigger using C# code that can better than update

I have a table and I fill one of the columns with a trigger if it is null or empty. I want to delete the trigger and do its job in code.
Do I have to first insert and after update or is there a better way?
In .NET Framework, ORM is NHibernate
CREATE TABLE [dbo].[Table]
(
[Id] INT NOT NULL PRIMARY KEY,
[Col1] NVARCHAR(50) NOT NULL,
[Col2] NVARCHAR(50) NOT NULL,
[Code] NVARCHAR(100) NULL
);
CREATE TRIGGER Update_Table
ON [dbo].[Table]
AFTER INSERT
AS
BEGIN
DECLARE #id INT
SELECT #id = Id
FROM inserted
UPDATE [dbo].[Table]
SET Code = 'CODE' + Id
FROM [dbo].[Table]
WHERE Id = #id AND Code IS NULL
END
I did this
Table entity = new Table() { Col1 = "aaa", Col2 = "bbb" };
entity = _repo.insert(entity);
entity.Code = "CODE" + entity.Id;
_repo.Update(entity);
sometimes i do not need update. Because users send this column value.
Table entity = new Table() { Col1 = "aaa", Col2 = "bbb", Code = "ccc" };
entity = _repo.insert(entity);
I tried insert then update. It is OK. Just seeking a better way.

I would simplify it by making CODE computed column, like this
CREATE TABLE [dbo].[Table]
(
[Id] INT NOT NULL PRIMARY KEY,
[Col1] NVARCHAR(50) NOT NULL,
[Col2] NVARCHAR(50) NOT NULL,
[Code] AS 'Code' + CAST(Id as NVARCHAR)
)
so, when inserting data, Code will be populated automatically

Notwithstanding Nino's answer, an interceptor is common way to achieve this.
Update:
It appears that event listeners are also an applicable technique too: https://stackoverflow.com/a/867356/1162077
You don't say how you're generating the entity id when it's not supplied by, so the event you intercept/handle will depend on how you're doing that.

Procedure not working because of unresolved reference to object

I'm writing a WPF application where at some point I'm trying to add new row to my database through procedure like below:
CREATE PROCEDURE dbo.InsertStudent
#IdStudent INT,
#FirstName VARCHAR(50),
#LastName VARCHAR(50),
#Address VARCHAR(50),
#IndexNumber VARCHAR(50),
#IdStudies INT
AS
SET NOCOUNT ON
INSERT INTO [dbo].[apbd.Student]
([IdStudent]
,[FirstName]
,[LastName]
,[Address]
,[IndexNumber]
,[IdStudies])
VALUES
(#IdStudent
,#FirstName
,#LastName
,#Address
,#IndexNumber
,#IdStudies)
but whenever I'm about to use it, I'm getting error:
SQL71502: Procedure: [dbo].[InsertStudent] has an unresolved reference to object [dbo].[apbd.Student].
I was looking for solution but what I've found was only to add reference to database through right click on References and so on, but I do not have this option in my solution explorer.
Maybe I'm looking for it in wrong places but the only options I have after right click are something like this:
Add reference...
Add reference to service...
Add connected/concatenated/accumulative (or however should it be translated) service
Add analyzer...
Manage NuGet packets...
as for the code behind creation of the tables in database:
CREATE SCHEMA apbd;
GO
-- tables
-- Table: Student
CREATE TABLE apbd.Student (
IdStudent int NOT NULL IDENTITY,
FirstName nvarchar(100) NOT NULL,
LastName nvarchar(100) NOT NULL,
Address nvarchar(100) NOT NULL,
IndexNumber nvarchar(50) NOT NULL,
IdStudies int NOT NULL,
CONSTRAINT Student_pk PRIMARY KEY (IdStudent)
);
-- Table: Student_Subject
CREATE TABLE apbd.Student_Subject (
IdStudentSubject int NOT NULL IDENTITY,
IdStudent int NOT NULL,
IdSubject int NOT NULL,
CreatedAt datetime NOT NULL,
CONSTRAINT Student_Subject_pk PRIMARY KEY (IdStudentSubject,IdStudent,IdSubject)
);
-- Table: Studies
CREATE TABLE apbd.Studies (
IdStudies int NOT NULL IDENTITY,
Name nvarchar(100) NOT NULL,
CONSTRAINT Studies_pk PRIMARY KEY (IdStudies)
);
-- Table: Subject
CREATE TABLE apbd.Subject (
IdSubject int NOT NULL IDENTITY,
Name nvarchar(100) NOT NULL,
CONSTRAINT Subject_pk PRIMARY KEY (IdSubject)
);
-- End of file.

A MS SQL Server database, by default, only has a single schema (dbo). You can add schemas to group things for either security or organizational purposes.
In your case, the schema apbd was created and Student was created on that schema not the dbo schema. So, to reference that table, you need to use [apbd].[Student].

I would run the following to determine the actual name and schema of the table:
SELECT
CAST(
MAX(
CASE
WHEN
TABLE_SCHEMA = 'apbd'
AND TABLE_NAME = 'Student'
THEN 1
ELSE 0
END
) AS bit
) [The table is apbd.Student]
,
CAST(
MAX(
CASE
WHEN
TABLE_SCHEMA = 'dbo'
AND TABLE_NAME = 'apbd.Student'
THEN 1
ELSE 0
END
) AS bit
) [The table is dbo.apbd.Student]
FROM INFORMATION_SCHEMA.TABLES
I'm also wondering if you perhaps need a USE statement at the start of your CREATE script - are you creating the procedure on the right database?
If the table is on a different database you would need to reference the database in your stored procedure, i.e. [DatabaseName].[dbo].[apbd.Student].

How do I link 3 tables using foreign keys?

I have made a table named "reservations" which contains a customer id and a house id. I made tables for houses and customers as well. I have made a datagrid, which contains the reservations data, but I also want it to contain the customers surname and the house code.
My tables are (in SQL Server Express):
CREATE TABLE [dbo].[houses]
(
[Id] INT IDENTITY (1, 1) NOT NULL,
[Code] VARCHAR(50) NULL,
[Status] VARCHAR(50) NULL,
PRIMARY KEY CLUSTERED ([Id] ASC)
);
CREATE TABLE [dbo].[customers]
(
[Id] INT IDENTITY (1, 1) NOT NULL,
[Forename] VARCHAR(50) NULL,
[Surname] VARCHAR(50) NULL,
[Email] VARCHAR(50) NULL,
PRIMARY KEY CLUSTERED ([Id] ASC)
);
CREATE TABLE [dbo].[reservations]
(
[Id] INT IDENTITY (1, 1) NOT NULL,
[HouseId] INT NULL,
[CustomerId] INT NULL,
[StartDate] DATE NULL,
[EindDate] DATE NULL,
PRIMARY KEY CLUSTERED ([Id] ASC),
CONSTRAINT [FK_HouseId]
FOREIGN KEY ([HouseId]) REFERENCES [houses]([Id]),
CONSTRAINT [FK_CustomerId]
FOREIGN KEY ([CustomerId]) REFERENCES [customers]([Id])
);
I already created all the tables, but I don't know how to link them properly. I want to get the data and put it in a datagrid.

To select all data from Reservations, customers' Surname and house code, you need to run query:
Select R.*, C.Surname, H.Code
From [dbo].[reservations] R
inner join [dbo].[customers] C on C.Id = R.CustomerId
inner join [dbo].[houses] H on H.Id = R.HouseId

Try this:
select r.*,c.surname,h.code from reservation r,customers c,houses h where
r.customer_id=c.customer_id and r.house_id=h.house_id

Using a nullable unique key in Entity Framework

I am developing a C# application with .net version 4.5.2 and an Oracle DB.
The problem I have is that I can't get Entity Framework configured right to the database. The database contains a unique keys over 4 columns, but 2 of them are nullable.
[Key, Column("GEBRUIKER", Order = 0)]
public string User { get; set; }
[Key, Column("EL1", Order = 1)]
public string Element1 { get; set; }
[Key, Column("EL2", Order = 2)]
public short? Element2 { get; set; }
[Key, Column("EL3", Order = 3)]
public short? Element3 { get; set; }
When I try to get the values through this code from the database I get a null reference exception, because element 2 or 3 is empty.
When I remove the Keys from element 2 and 3, I wont get the right data, because when element 1 is the same at 2 rows, the second row will cache element 2 and 3.
Question: How can I handle these nullable unique keys?
Added extra information:
Well, this is a part of the create script of the database:
CREATE TABLE USERS
(
GEBRUIKER VARCHAR2(3 BYTE) NOT NULL,
EL1 VARCHAR2(6 BYTE) NOT NULL,
EL2 NUMBER,
EL3 NUMBER
)
CREATE UNIQUE INDEX USERS_UK ON USERS
(GEBRUIKER, EL1, EL2, EL3)
ALTER TABLE USERS ADD (
CONSTRAINT USERS_UK
UNIQUE (GEBRUIKER, EL1, EL2, EL3)
USING INDEX USERS_UK
ENABLE VALIDATE);
It is not possible to make any changes to the structure or data, because there are multiple applications who use this database. Also there are already EL2 and EL3 rows with the value 0.
Example data:
{'USER1','A','A','C'}
{'USER1','A','B','B'}
{'USER1','B','A','C'}
When I do a linq query to select USER1 AND EL1 = A I will get the next result:
{'USER1','A','A','C'}
{'USER1','A','A','C'}
instead of:
{'USER1','A','A','C'}
{'USER1','A','B','B'}

No, you can not do that:
Look to your table creation SQL:
CREATE TABLE [dbo].[Users](
[GEBRUIKER] [nvarchar](128) NOT NULL,
[EL1] [nvarchar](128) NOT NULL,
[EL2] [smallint] NOT NULL,
[EL3] [smallint] NOT NULL,
CONSTRAINT [PK_dbo.Users] PRIMARY KEY CLUSTERED
(
[GEBRUIKER] ASC,
[EL1] ASC,
[EL2] ASC,
[EL3] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
All primary keys are Not Null!!!!
let say you want to hack this key and change it by your self with SQL then you have to send the following script which will changes the key from nun nullable to nullabe!
usersDatabaseContext.Database.ExecuteSqlCommand(#"
while(exists(select 1 from INFORMATION_SCHEMA.TABLE_CONSTRAINTS where CONSTRAINT_TYPE='FOREIGN KEY'))
begin
declare #sql nvarchar(2000)
SELECT TOP 1 #sql=('ALTER TABLE ' + TABLE_SCHEMA + '.[TableName] DROP CONSTRAINT [' + CONSTRAINT_NAME + ']')
FROM information_schema.table_constraints
WHERE CONSTRAINT_TYPE = 'FOREIGN KEY'
exec (#sql)
end
ALTER TABLE Users DROP CONSTRAINT [PK_dbo.Users]
ALTER TABLE Users ALTER COLUMN EL2 SMALLINT NULL
ALTER TABLE Users
ADD CONSTRAINT [PK_dbo.Users] PRIMARY KEY CLUSTERED ( [GEBRUIKER] ASC, [EL1] ASC, [EL2] ASC, [EL3] ASC )",
TransactionalBehavior.DoNotEnsureTransaction);
var user1 = new User { UserName = "Bassam", Element1 = "1", Element2 = null, Element3 = 3 };
I have :
- Removed all foreign keys which are related to this table
- Dropped the primary key
- Changed the **Column E2 to nullable**
- Adding the modified primary key again
This will causes an error in SQL Server / Oracle DB with the following message in case of SQL Server:
Cannot define PRIMARY KEY constraint on nullable column in table 'Users'.
Think good about it even let say you can do that what will happend with your primary key:
How can you guarantee the uniqueness of your PRIMARY key? Let say your User entity will creates the following row:
Id1 Id2 Id3 , Id4
"a" , "a" , null , null
You cannot create the same entry again because this row will be exists in the table!

Entity Framework, random query paging

This is what I want to achieve:
I want to query my db to return a list of entities
Randomize the list
Store the IDS of items received for future queries
Run a new query on the same table where the IDs are in the list that I have stored
Order by the list that I have stored.
I have managed to achieve step 1, 2, 3, 4 already but step 5 is difficult. Can anyone help me with a query like so:
SELECT *
FROM table_name
WHERE id IN (1,2,3,4....)
ORDER BY (1,2,3,4....)
Thanks in advance

Try
SELECT table_name.*
FROM crazy_sorted_table
LEFT JOIN
table_name ON crazy_sorted_table.ID=table_name.ID

A normal join (equi join) should do the trick , here is sample approach i tested:
/**crazyOrder filled 100 rows with random value from 1-250 in Id**/
CREATE TABLE [dbo].[crazyOrder] (
[Id] INT NOT NULL,
[Area] VARCHAR (50) NULL,
PRIMARY KEY CLUSTERED ([Id] ASC)
);
/**Normal order is filled with value from 1-100 sequentially in id**/
CREATE TABLE [dbo].[normalOrder] (
[Id] INT NOT NULL,
[Name] VARCHAR (50) NULL,
PRIMARY KEY CLUSTERED ([Id] ASC)
);
create table #tempOrder
(id int)
insert into #tempOrder
Select top 10 Id
from crazyOrder
order by NewID()
go
Select n.*
from normalOrder n
join #tempOrder t
on t.id = n.id
I was able to retrieve the rows in the same order as in the temp table (i used a data generator for the values)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Fast Way to Replace Names with Ids in Datatable? - c#

Related

INSTEAD OF SQL Server trigger using C# code that can better than update

Procedure not working because of unresolved reference to object

How do I link 3 tables using foreign keys?

Using a nullable unique key in Entity Framework

Entity Framework, random query paging

Categories

Resources