Microsoft Access for Beginners - Part I
Organizing the Data
Contents
Introduction
A Different Set of
Tools
Tables and Relationships
Types of Relationships
Referential Integrity
Changing Direction
Remaining Tables
Conclusion
Recently, a friend talked to me about setting up a database for tracking project
information within her company. She had never used Access before and wasn't
sure if she wanted to do it herself or ask me to take on the project. I always
like to see people learn something new so I told her we could do it either way.
A couple weeks later, she sent me her first attempt at a database, asking me not
to laugh.
What my friend had designed was actually typical of many beginners' attempts at
Access design. She had used Access in much the same way as she would use Excel;
one table with fields for all of the information that needed to be tracked.
Since more than one record will often refer to the same project, items such as customer
and contractor names were repeated from one record to the next. In addition
to taking up more space, this can also lead to errors and mis-spellings that will
make it difficult to find the information needed for various reports.
It's understandable that many people automatically try to use Access this way as
most people are familiar with the spreadsheet concept and many are unfamiliar with
higher database concepts. Access presents tables that look something like
the spreadsheet format that people are used to and, since it's offered as part of
the Microsoft Office suite, it's easy to see it in relation to the other programs
and overlook its power as a standalone product.
For this series, I've designed another example based on an employee information
database. The information includes contact information for all employees,
both past and present, as well as current job information and other data required
by the company and various agencies.
(Note: I use the employee database example as something that is familiar to
most people. This does not mean that I recommend using Access for the type
of extremely sensitive information that can be stored in such an application.
For more information on the limitations of Access security, please see Information
Security in Microsoft Access , a supplemental article to this
series.)
The Excel spreadsheet at the link shown below is what a beginning effort at tracking
this information might look like. In this first chapter, I'll introduce some
of the concepts involved in the process of translating this information into a well-designed
Access database.
Back to Top
Sample Files
If you have not already done so, please download the demonstration files:
Demo.zip - Access demonstration database and original Excel spreadsheet
(These files require Access and Excel 2000 or later.)
A
companion slideshow is also available
for this chapter.
A Different Set of Tools
Access is referred to as a relational database system for two reasons:
- Data is organized into tables which are also called relations. This is because
each table has a subject (i.e. tblPersonnel for employees) and every field in a
given table is supposed to be directly related to the subject of the table.
If you look at the design of the Personnel table, you will see that each of the
fields is a piece of information that pertains to a given employee. You also
won't see more than one address for an employee or more than one job title.
- Tables can be linked or related to each other. In the sample database, this
is demonstrated by the link between tblPersonnel and tblAbsences (shown below).
Since there can and likely will be more than one absence for each employee, the
absence information is placed in a separate table and the two tables are linked
by a number assigned to the employee. This eliminates the duplicate entry
of information such as the employee name and department. This, in turn, reduces
the size of the database and the possibility of data entry error and corruption.

By selecting Relationships from the Tools menu in Access, you can view the links
between the tables in the database. In the Personnel database, a relationship
is defined between tblPersonnel and tblAbsences.
This is how Access differs from a program like Excel. While Excel organizes
information and allows for calculations, Access takes it a step further and lets
you define relationships between different categories of data. This also enables
easier analysis of your information.
Let's start by looking at the original table as it's shown in the spreadsheet.
An Excel spreadsheet showing the original table listing the employee information.
The original spreadsheet shows the necessary information for each employee with
separate pages for employee absences and lookup information for the different departments.
This retains the information but it is tedious to use and mistakes are easy to make.
There are no controls over the editing of information. Also note the mis-spellings
and unlikely dates in certain records which can happen in a spreadsheet that is
updated over months and years by a busy HR clerk.
One of the goals of a user-friendly Access database is to minimize the amount of
work required to enter and retrieve the information, thereby reducing workload and
the possibility of error. This is where the ability to group data into separate
tables comes in handy. The tables you create and the relationships between
them are the backbone of your database. Once you have the tables designed
correctly, you're halfway there.
Back to Top
Tables and Relationships
After downloading the demonstration files, start by opening the demo database.
Press the F11 key to show the database objects window and select Relationships from
the Tools menu at the top of the screen. The database window will show you
all the tables and other objects in the database. The Relationships window
will show you the relationships between the tables.

(Click for example of screen in Access 2007)
In another chapter, I'll go into more detail about actually creating a table and
using the settings available for fields and tables. My interest here is to
show you the proper way to design what's called the database schema, which is the
collection of related tables and other objects that form the structure of the database.
We'll start by looking at three of the tables in the database; the tblPersonnel,
tblAddresses and tblAbsences tables. The central table in the database is tblPersonnel
which contains the essential information for each employee. In accordance
with good database design, it only contains the information that specifically relates
to the employee such as the name, relevant dates and status. In this case,
all of the other tables in the database link to it in one way or another.

If you view the design of tblPersonnel by selecting the table in the database window
and clicking on the Design button, you will see that the EmpKey field is identified
as the table's Primary Key. This means that the field contains a value that
is unique for each record in the table, enabling the database to identify that particular
record. Access encourages you to create a primary key for every table that
you design.
In the second column of the design view, the data type assigned to the EmpKey field
is called AutoNumber. This is a unique number assigned by the database to
every record that the user enters and cannot be edited by the user. In most cases,
an AutoNumber should not be of any significance to the people using the program
and can even be hidden from the user's view.
For example, you would not want to use this number as an employee number within
the company because there is no guarantee that it won't skip numbers if records
are deleted or cancelled. It will also assign the numbers in the order that
the records are entered while employee numbers are often assigned in order of hire.
An AutoNumber used as a primary key only serves as a unique value and to allow for
the linking of related tables as you'll see in a moment.
 |
 |
 |
Design views of tblPersonnel, tblAddresses and tblAbsences. (Click for larger view)
|
(It is possible to set other non-AutoNumber fields to be unique so that the database
will not allow the same value to be entered in the field twice.)
You'll also notice a few extra fields in each table. Another advantage that
comes with grouping information in separate tables is the ability to store more
information without getting lost in it as you might with a large spreadsheet.
In the bottom half of the table design screen, you'll see some extra settings for
each field. When adding fields to a table, in particular you should pay attention
to settings such as field size, input mask, default value, the validation settings
and required. Proper use of these settings will provide the controls needed
to help ensure that data is accurately entered. I provide more information
on these properties in the next chapter and you can get more information on any
of them by placing the cursor in one of the setting boxes and pressing F1.
Access has a few rules concerning field names but is very flexible in what it will
accept. I try to keep my field names relatively short but descriptive enough
to avoid confusion. One rule that I follow is that while Access will allow
for spaces in field names, I don't use them. As your databases become more
sophisticated, you may find that they lead to confusion and possible errors.
Also, be careful of using reserved words like "Date" which is an Access function.
Access will allow you to use them but they can cause errors later that are hard
to track down.
Both tblAbsences and tblAddresses also have fields titled EmpKey but the data type
is Number instead of AutoNumber. In terms of the relationships between these
tables and tblPersonnel, the EmpKey field in tblAbsences is referred to as the foreign key while it's called the primary
key in tblPersonnel. This is because the value originates in tblPersonnel and corresponding
records in tblAbsences will relate back to tblPersonnel by having the same value
in the foreign key field. Another feature of tblAbsences is what's called
a composite key, or a key that is made up of more than
one field. In this case, you cannot have more than one record with the same
combination of employee, date and pay status.
Substituting the EmpKey value for the employee name throughout the other tables
not only saves space but eliminates misspellings of names and allows for additional
report functions that you will see later. It also solves the problem of having
more than one employee with the same name. By using the number, the database
knows exactly who you're talking about.
At this point you may be thinking "Why not just use the SSN for the primary key,
since it's unique and you're storing it in another table anyway?" I cannot
stress it enough - NEVER use a value as confidential as a Social Security Number
or credit card number as a table key. In another part of the series, I will
talk about how to hide this information from those who should not have access to
it. You cannot prevent users from viewing table keys if they wish, especially
if they're referenced by every other table as in this database.
Back to Top
Types of Relationships
By looking at the relationships between these tables, you can see how these relationships
are defined. Drawing a relationship between two tables or double-clicking
on an existing relationship line will bring up the relationship edit screen where
you can view the settings. In the relationship between tblPersonnel and tblAbsences,
the 1and ∞ symbols above the line indicate a one-to-many relationship
between the tables. This means that for every employee listed in tblPersonnel,
there can be many absences recorded.

The relationship between tblPersonnel and tblAddresses is different. In this
case, you'll see a 1 on both sides of the relationship. There is
only one address recorded for every employee and it is called a one-to-one relationship.
The same is true with tblEEOCData, tblNumber and tblPastEmployees.
There are a number of reasons why information may be split off into another table
like this. In this case, it's mainly to reduce the size of tblPersonnel by
keeping a number of fields related to the same aspect of employee information (i.e.
the contact info) in another table. The information is still just as accessible
by the use of queries and forms which I'll talk about in later chapters. Another
reason is for security. Access to things like Social Security Numbers often
needs to be limited to certain personnel. Putting it into a separate table
is the first step. With appropriate assignment of user rights and design of the
program interface, you can place a certain degree of security on this and other
sensitive information. Those are also topics for later chapters but when designing
a database, it is important to anticipate security needs and other concerns from
the very start.
(
Click here for more information about Access security
)
A third type of relationship - a many-to-many relationship - is sometimes needed.
While it is not shown in this database, an example would be if the database tracked
employee training classes. Each employee might take many classes and there
would be many employees per class. Many-to-many relationships are not directly
possible in Access. To create them, it is necessary to create a third intervening
table that breaks the relationship down into two one-to-many relationships.
In the employee training class example, you might have a registration table that
would include fields for the Employee ID and the Class ID as well as any other information
pertinent to that employee's registration in the class. This table would then
link to both the employee and class tables on those fields.

Back in the relationships window, you'll also notice that the direction of the arrow
points to tblAbsences. If you click on the Join Type button on the Edit Relationships
panel, you can see that the relationship is set to show all records from tblPersonnel
and then any records in tblAbsences that match those records. This will pull
data from tblPersonnel for employees even if are no records for that employee in
tblAbsences and is referred to as a left outer join.
The first option would pull only the records where the value for this field is present
in both tables and this is called an inner join.
In this case, employees with no absences would not be listed which may or may not
be what you want. You'll see the importance of using the proper join type
when you start designing queries to work with your table data.
Going back to the original spreadsheet, notice the last page titled "Departments".
This reference information is brought into the database through the tblDepartments
lookup table shown in the relationships window. If you view the table design,
you'll notice that the primary key, DeptNo, is not an AutoNumber field. I
left it as a number field since many companies assign their own numbers to various
internal departments. The table is only linked to tblPersonnel. Again, a lookup
field will be provided so that when entering the department information for an employee,
the user selects from a list of department names and the department number is substituted
in tblPersonnel.
In this case, referencing the department number instead of the name has another
advantage. If the company should reorganize, as companies often do, and a department
name changes, the name only has to be changed once in the departments lookup table
and all of the employees records for that department will reference the new name.
If departments merge or are eliminated, simple functions can be designed to change
one number to another in all the records. For more information, see my article
on creating a lookup field.
By using its relationship to tblDepartments, the personnel table stores a number
corresponding to a given department.
Because of the lookup function added to the Department field of tblPersonnel for
this example, the table displays the appropriate title even though only its number
is stored.
Back to Top
Referential Integrity
In the bottom half of the Edit Relationships box, you'll see a section dealing with
something called Referential Integrity. This is a method for ensuring that
data from one table that is supposed to match data in another table is not changed
on only one side of the relationship by accident.
An example of this would be if the number assigned to a department in tblDepartments
was changed or a department was deleted. If there were employees assigned
to that department, this could cause a problem. Without referential integrity, you
would be left with employees who were not assigned to an existing department.
It would get even worse if a number of departments were changed at once. With referential
integrity activated, Access protects records and primary keys that are referenced
by records in other tables. If you try to delete a record that is referenced
by another table, you might get a message like this one:
On the other side of the example, it would not be possible for someone to enter
a non-existent department number in tblPersonnel because the database would not
find a corresponding department in the tblDepartments.
Of course, in some cases, you do need to be able to delete or change records, even
if they are referenced in other tables. The two options under referential
integrity for 'cascade' updating and deleting of records address this. Using
cascade update, if a record's primary key is changed, that value is also changed
throughout the database in any records that reference it. If cascade delete
is active, then instead of getting the message shown above, you'll get something
like this:
If you select Yes, then any records in other tables relating to that record would
be deleted. In the case of the relationship between tblPersonnel and tblDepartments,
a cascading update would change the department number in the appropriate records
in tblPersonnel. If a department was deleted, a cascading delete would delete
any employee records for that department. The options for referential integrity
are not automatically appropriate for every situation. Always take the nature
and needs of the data being stored into account when you set options for tables
and relationships.
Back to Top
Changing Direction
The skills table (tblSkills) demonstrates another significant difference between
common spreadsheet and database design. In the original spreadsheet, you'll
notice the last three columns on the main page list the significant skills for each
employee. Since there are only three columns, the assumption is that the listing
is limited to three skills per employee and not much detail is provided.
Another problem is that while the arrangement allows for some information to be
entered on each employee, it doesn't accommodate searches for specific skills.
Just from this small sampling of employees, notice that there is more than one person
with an MOS (Microsoft Office Specialist) certification. If there were a couple
hundred employees listed here and you wanted to do a search for everyone with this
or other qualifications, you would have to do a manual search on all three columns.
This is not such a big deal in Excel but if this is transferred to an Access table,
it's a different story.
The solution in Access is to actually change the direction of the data. Instead
of listing the skills horizontally, tblSkills lists a record for each instance of
a skill along with the appropriate EmpKey (employee) value and a field for additional
information such as years of experience or type of degree. In the screenshot
above, you can see the employee name substituted for the number by the lookup function
along with the skill description and details. In a later chapter, you'll see how
this makes it easy to search on a specific skill and pull up a printable list of
all qualified employees.
Back to Top
Remaining Tables
The last tables in the database are tblNotes and tblPastEmployees which may be self-explanatory.
The notes table is a generic storage place for any comments and notes that may need
to be entered for each employee. Its unique feature is that while it references
the EmpKey field in tblPersonnel, it does not use it as part of its own key as no
restriction is needed concerning the number of notes entered for a given employee.
The primary key is an AutoNumber field that only serves as a unique identifier.
When I discuss the design of forms, you will see how the data from this table can
be displayed in the same form as the rest of the employee data while taking up a
minimum of space on the form and still allowing the user to see all available records.
The past employees table is another one-to-one relationship where the data is split
off from the main employees table. In this case it is done to make specific
reporting on past employees a little easier. Another reason is to avoid a
lot of blank records in the main table pertaining to information that is only available
for a small percentage of people who are no longer with the company.
Back to Top
Conclusion
This process of organizing data in Access or any other type of database is called
data normalization. A normalized database follows a set of rules known as
Normal Forms. While I've provided a practical example of this process,
Microsoft's support site also offers a couple of very good articles which detail
the rules of normalization and provide more examples. For more information,
please refer to the following Microsoft Knowledge Base articles:
283878 - Database Normalization
Basics
283698 - ACC2002:
"Understanding Relational Database Design"
As I discussed the various tables in this database, you may have found yourself
thinking that it looks like a lot more data entry than is shown in the original
spreadsheet. If the tables were actually intended for data entry, that would
be true. This is why I almost never allow users to enter data directly into
the tables - that's what forms are for! Properly designed forms use the relationships
between the tables to bring much of the data entry work together in one place.
They can automatically supply many of the values required in the tables and can
also include interesting features like drop-down lists and default values that eliminate
a lot of typing and help to ensure correct data entry. I have a couple of
chapters later in the series on form
design that will show you how to create a good user interface for
your application.
In the
next chapter , I'll go into greater
detail about how you can create tables like the ones you've seen here.
Back to Top
For more information:
Return to Series Index