Monday, October 12, 2009

sql server database design interview questions

Last updated: 10/5/2010 9:19 PM
Database design interview question
most common approaches to design schema
#DB design approaches

  1. Gather the requirements from FRS(functional requirement specification) or the most common one is from the prototype of the application or from the existing legacy application database.
  2. create a conceptual model (use case) , <you can skip this if you have prototype or existing DB design in legacy>
  3. Create a logical model (ER Modeling) from this conceptual model( logical model contains only table,columns,relations) with out specifying data types
  4. apply the normalization rules to reduce the redundancy (duplicates) and avoid the anomalies( problems with deleting or updating the data with out primary key)
  5. convert the logical model to physical model (choose RDBMS)(data types, constraints, indexes etc...) from the logical model
  6.  create a prototype or from the existing prototype of your application and start design
     test the model with the prototype.
Database design interview question
What are the advantages and disadvantages of Normalization
#is my database normalized

Advantages: reduce the redundancy(dupliate) ,avoid null values, avoid update anomalies(refer update anomalies)

Disadvantages: too many joins required to produce reports.
Database design interview question
Normalization or De-normalization. Which one to choose ?
#Tradeoff matrix

It depends. A tradeoff between no. of updates vs. reports will guide till what level the database should be normalized or de-normalized.
Database design interview question
What is your approach on normalizing the database to improve the performance of the database

Refer above questions answer.
Database design interview question
Have you experienced any problems or failures in database design?

·        Assigning too large data types, thinking that the length of the data would grow in future. E.g.:. Customer_first_name varchar (90) – in general this will not be 90, but if the designer doesn’t have any idea on the realistic values then this leads to occupying more pages for small data.
·        Not doing any normalization or improper normalization: when designing the database focusing on meeting the requirements, sometimes the designer might ignore to normalize the database and especially doing the tradeoff between No. of Insert/Update/Deletes vs. reports.
Tip: validate the designing through having a standard checklist helps to make sure that the design is up to the mark.

·        Don’t keep the audit data in the same table, look at the below table where the table has the audit data itself where you can’t maintain any constraints (fk, unique). Requires maintaining the integrity at the coding level where it leads to complex SQL statements instead of simple queries.

01-jan-1999 5 AM
01-jan-1999 5 AM
01-jan-1999 5 AM
o   Resolution: create a view with a SQL statement to produce only the active data and use these views in your front-end application. Rest of all the tables will be act as audit tables.
·        Don’t design the database by assumptions, all the assumptions must be clarified with all the stakeholders before finalizing the database design.
·        Freeze the scope of the project before finalizing the database design
·        I had seen many projects that the front-end developers are mostly complaining about the database that the design is not well and good and leads to the poor performance. This is in fact true and below is the most possible scenarios in most of the projects which makes the design not up to the mark for performance.
o   Not following any trade-off matrix before making any decisions on changing the database design during the every build – This is the most common problem as most of the project requires some changes and the designer/developer has to be very careful before making any change on the schema, he has to consider the impact of the change WRT time, cost, scope and has to think about how to mitigate as well.
Database design interview question
What is the difference between Primary Key and Unique Key
#heap table

A unique key can also be a primary key and can involve in joins.
Differences are: when you create a primary key by default it will create a clustered index whereas for the unique key by default it will create non clustered index.
A table will contains only one primary key in a table, but can contains more than one unique key.
Primary key won’t allow any null value, but unique key allows only one null.
Database design interview question
Can a foreign key allow null value?

Yes.(also refer cardinality)
Database design interview question
Mention few best practices to design a database

·        Select the right design tool to design the database, there are many tools available in the market and the most popular one is Erwin, but make sure that Erwin ( not sure about the current version)  itself has some problems in it and you must know about these before using the Erwin.
§  One of the most challenging task when working with Erwin is during making any changes , Erwin generates the scripts which contains the drop and recreate statements which has to be analyzed first before executing directly on the development database which might make the database inconsistent if the script fails to execute at the middle.
§  Creates a non-clustered index by default on the primary key, so make sure there won’t be any heap tables created by Erwin.
§  When you modify the table in Erwin which has any comments defined then the script generated by Erwin will not contains any of these comments.
§  Finally make sure its 100% compatible with the version of the SQL Server you are using.
·        Define the scope of your project and finalize the database design before moving to the development as any change in the schema leads to the major changes in the application development.
·        Conform the logical design with the requirements and get the sign-off
·        Tip: It’s possible to confirm the design of the database as per requirements through the prototype of the application.
·        follow a proper naming convention
·        try to define the data types as short as possible ( refer SQL Server Page 8 KB)
·        Don’t use the text data type unless its highly required as querying the data inside a text data type sometime requires to build a full text catalog

Database design interview question
What is the difference between Primary Key and Candidate Key

Candidate key: A single column or set of columns that can uniquely identifies the row in a table. a candidate key can acts as a primary key and you can define only candidate key as a primary key.
Primary Key: A key or set of keys that can uniquely identifies the row in a table.
E.g.:  sometimes when you declare a surrogate key (column with identity property) as a primary key then the actual business primary key will acts as candidate key.
Database design interview question
What is cardinality

It’s a property to specify the number of instances of an entity that can be existed in each side of the relationship. E.g.:. One-One or null, One-Many or null
Database design interview question
What is Functional Dependency?

It’s a special relationship among columns in a table .A set of attributes X in table functionally dependent on set of attributes Y on the same table where the values in attributes Y  identifies the values X.
E.g.:. Employee Address, Employee phone functionally depends on Employee Name.
Database design interview question
What is Tuple

Row in a table
Database design interview question
What is a Schema?

Table, columns and the relations between the tables.
Note: the concept of schema in SQL Server 2005 & 2008 is different from the concept of schema in relational theory where the SQL Server 2005 & 2008 refers to the logical container of database objects.
Database design interview question
what is the difference between DBMS and RDBMS
# SQL Server and Codd Rules
Rather than differentiating between the DBMS and RDBMS in detail, the fact is all the leading database software available in the market are DBMS.
So what is DBMS: software to manage the database (collection of data) through insert/update/delete / retrieval of data through a systematic and structured way (SQL) and manages the hierarchical, network, relational, object oriented databases.
To define any DBMS as RDBMS it has to satisfy all the 12 rules defined by E.F Codd.
There is an assumption that if the system has any relational databases then it’s an RDBMS which is not really true. For e.g.: MS Access is not an RDBMS though the recent version maintains the relationships through primary and foreign keys.
Is SQL Server is DBMS or RDBMS
Yes, but not 100%. As per E.F Codd rule 0, to define any system as an RDBMS it has to satisfy all the 12 rules.
Most of the leading software in the market are violating the Rule-9: Logical Data Independence.
Database design interview question
can you create 2 tables with the same name
yes , in a different schemas
SQL server 2008 Database design interview question
Can you enforce the naming convention in SQL Server to have all the tables should follow as defined?
Yes, through policy based management.

Open SSMS àManagement àPolicy Management àFacets à right-click Table, and then click New Condition
à type Table starts with tbl_
* In the Facet box, confirm that Table is selected.
* In Expression area, in the Field box, select @Name, in the Operator box select LIKE, and in the Value type "tbl_%"
* Optionally, you can type a description of the condition, by clicking on the Description tab.
Database design interview question
what are the new data types in SQL Server 2005
Database design interview question
What are the new data types in SQL Server 2008/2012/2014
Database design interview question
what is sparse column and its pros and cons
As per BOL “columns reduce the space requirements for null values at the cost of more overhead to retrieve non-null values”
So, when the column value is NULL for any row in the table, the values require no storage, but when it has any value then it takes few more bytes than its actual size.

It’s better to go for this only when the column has lot many null values.

Refer BOL for the limitations and estimated space.
Database design interview question
How will you implement a tree structure (one employee reports to another who is an employee of the organization itself) in design?
What are the challenges in managing this type of tables?
refer the dev-tree struct-qries

Int ( primary key , identity 1,1)
char(9) , unique key
int (foreign key references emp_id

when you enter the data it looks like the below :

bill gates
vice president
Jeff raikes

Challenges: populating data from the text files is bit challenging.
If you try to insert the new data and update the existing data from the below text file into this table then you need to write at least one cursor.

marketing manager
Jim alchen
David Campbel
Paul Flessner
sr.vice president

Database design interview question
what is the disadvantage of having a foreign key
There is no any disadvantage in particular and in fact it supports the referential integrity which is the major advantage.
But internally when there is a foreign key means there is a overhead on the SQL Server system :
Before delete a master record: the engine will look at the existence of the child record in foreign key table internally.
before insert or updating a primary key a child record : the engine will look into the master record in the primary key table
conclusion : so it’s always better to create a clustered index on the primary key and a non-clustered index on a foreign key
why there should be a clustered index on every primary key and non-clustered index on every foreign key
Refer above question.
give an example for the generalization
Generalization: when 2 or more entities shares the common attribute can be GENERALIZED into a higher level entity type.
E.g. Permanent (name) | Contractors (name)
Super Entity / Generalized entity is Employees (Login Name)

what is a surrogate key and how you will implement surrogate key in SQL Server
what are the advantages and disadvantage of having Identity Keys in SQL Server
basis on what you will define a identity key
Identity key (auto increment key) is an integer field with auto increment property. this can be act as a surrogate key when there is no primary key in the table or the length of the primary key is large.

Advantages of having a identity key is its auto increment, so we no need to enter a value in this field.
which NF the below table “customers” is in

123, cb street,new jersey, usa

which NF the below table “customers” is in

cb street
new jersey

what is normalization define 1-NF, 2-NF,3NF
1NF – No Repetitive Groups , Each non-key attribute should functionally dependent on key attribute

2NF – (! Applies if it has composite Key) – It should be in 1NF and every non-key attribute should functionally dependent on the whole key (composite key , if exists) and NOT just part of the key

3NF (!Applies if it has computed columns or dependent columns on non-keys) – It should be 2NF, NO transitive dependency (No dependencies between non-key attributes)
what is a Heap Table and when will you make heap tables
A table without a clustered index is called a heap table
Though the table has non-clustered index , but with OUT a clustered index still considered as heap table
what is collation and how will you define a collation at column level
Collation means character set (language) . it could be USA English, zapan,etc..
give an example for 1-1 relationship
A table with large number of columns where some of the column groups has more nulls can be isolated to a different table one-one relationships to improve the performance on updates since it takes less space and more rows can be fit in single page belongs to the table
give an example for 1- many relationship
Database design interview question
Is your database is DBMS?
this is a very basic and a funny question J
Your database is not same as the database software (Oracle/SQL Server).
Its incorrect to say that my database is an DBMS
what is the difference between and when to choose what:

complete create table syntax
performance tuning considerations
security considerations
Below are the most common mistakes :
Including sensitive information : passwords , SSN, DOB etc in the database and replacing the values with the binary – though a developer cant see the values by default , but by using a convert function he can get the results.

Its always recommended to use the encryption function with a key Eg:.openpgp function or through using sql server certificates or using a customized algorithm instead using binary datatype
Index – fill factor – pad index – page split


SQL Lion said...

Very Nice post, quiet informative.
Database designing problems being the buzzwords these days, one of the most common designing mistakes is the “Poor Planning / Architecture” of the database or mart. Follow the link to know more…

Anonymous said...

Nice post... Sir


Anonymous said...

Very helpful post. Thank you.

Featured Post

SQL Server AWS Migration BCP

stored procedure to generate BCP scritps to migrate the SQL Server database. Developed this stored procedure on my labs to simulate the Mi...