Additional features and updates on all the newest articles are available on  Drewslair 24/7!

 Subscribe to Drewslair 24/7


View Andrew Comeau's profile on LinkedIn


Microsoft Certified Professional

 

 

 

Microsoft Access for Beginners - Part I
Organizing the Data

Contents

Introduction
A Different Set of Tools
Tables and Relationships
Types of Relationships
Referential Integrity
Changing Direction
Remaining Tables
Conclusion  

Introduction

Recently, a friend talked to me about setting up a database for tracking project information within her company.  She had never used Access before and wasn't sure if she wanted to do it herself or ask me to take on the project.  I always like to see people learn something new so I told her we could do it either way.  A couple weeks later, she sent me her first attempt at a database, asking me not to laugh.

What my friend had designed was actually typical of many beginners' attempts at Access design.  She had used Access in much the same way as she would use Excel; one table with fields for all of the information that needed to be tracked.  Since more than one record will often refer to the same project, items such as customer and contractor names were repeated from one record to the next.  In addition to taking up more space, this can also lead to errors and mis-spellings that will make it difficult to find the information needed for various reports.

For this series, I've designed another example based on an employee information database.  In this case, the information includes contact information for all employees, both past and present, as well as current job information and other data required by the company and various agencies. 

(Note:  I use the employee database example as something that is familiar to most people.  This does not mean that I recommend using Access for the type of extremely sensitive information that can be stored in such an application.  For more information on the limitations of Access security, please see Information Security in Microsoft Access , a supplemental article to this series.)

The Excel spreadsheet at the link shown below is what a beginning effort at tracking this information might look like.  In this article, I'll introduce some of the concepts involved in the process of translating this information into a well-designed Access database.   

It is not my intention in these articles to go into detail on how to create everything that I show but only to introduce you to the various concepts involved in properly designing an Access database and some of the possibilities that exist.  I've tried to include enough information so that you can easily find additional information you might need through the help files.

Back to Top

Sample Files

If you have not already done so, please download the demonstration files:

Demo.zip - Access demonstration database and original Excel spreadsheet (Requires Access and Excel 2000 or above)

A companion slideshow is also available for this chapter.

A Different Set of Tools

Access is referred to as a relational database system for two reasons:

  1. Data is organized into tables which are also called relations.  This is because each table has a subject (i.e. tblPersonnel for employees) and every field in a given table is supposed to be directly related to the subject of the table.  If you look at the design of the Personnel table, you will see that each of the fields is a piece of information that pertains to a given employee.  You also won't see more than one address for an employee or more than one job title.
  2. Tables can be linked or related to each other.  In the sample database, this is demonstrated by the link between tblPersonnel and tblAbsences.  Since there can and likely will be more than one absence for each employee, the absence information is placed in a separate table and the two tables are linked by a number assigned to the employee.  This eliminates the duplicate entry of information such as the employee name and department.  This, in turn, reduces the size of the database and the possibility of data entry error and corruption. 

This is how Access differs from a program like Excel.  While Excel organizes information and allows for calculations, Access takes it a step further and lets you define relationships between different categories of data. This also enables easier analysis of your information. 

Let's start by looking at the original table as it's shown in the spreadsheet. 

An Excel spreadsheet showing the original table listing the employee information.

An Excel spreadsheet showing the original table listing the employee information.

The original spreadsheet shows the necessary information for each employee with separate pages for employee absences and lookup information for the different departments.  This retains the information but it is tedious to use and mistakes are easy to make.  There are no controls over the editing of information.  Also note the mis-spellings and unlikely dates in certain records which can happen in a spreadsheet that is updated over months and years by a busy HR clerk.

One of the goals of a user-friendly Access database is to minimize the amount of work required to enter and retrieve the information, thereby reducing workload and the possibility of error.  This is where the ability to group data into separate tables comes in handy.  The tables you create and the relationships between them are the backbone of your database.  Once you have the tables designed correctly, you're halfway there.

Back to Top

Tables and Relationships

After downloading the demonstration files, start by opening the demo database.  Press the F11 key to show the database objects window and select Relationships from the Tools menu at the top of the screen.  The database window will show you all the tables and other objects in the database.  The Relationships window will show you the relationships between the tables.

The database window is accessed by pressing F11 and shows the details for all objects in the database.

In another chapter, I'll go into more detail about actually creating a table and using the settings available for fields and tables.  My interest here is to show you the proper way to design what's called the database schema, which is the collection of related tables and other objects that form the structure of the database.

We'll start by looking at three of the tables in the database; the employee, addresses and absences tables. The central table in the database is tblPersonnel which contains the essential information for each employee.  In accordance with good database design, it only contains the information that specifically relates to the employee such as the name, relevant dates and status.  In this case, all of the other tables in the database link to it in one way or another. 

The relationships window shows the relationships that have been defined between database tables.

If you view the design of tblPersonnel by selecting the table in the database window and selecting Design, you will see that the EmpKey field is identified as the table's Primary Key.  This means that the field contains a value that is unique for each record in the table, enabling the database to identify that particular record.  Access encourages you to creaate a primary key for every table that you design.

In the second column of the design view, the data type assigned to the EmpKey field is called AutoNumber.   This is a unique number assigned by the database to every record that the user enters and cannot be edited by the user. In most cases, an AutoNumber should not be of any significance to the people using the program and can even be hidden from the user's view. 

For example, you would not want to use this number as an employee number within the company because there is no guarantee that it won't skip numbers if records are deleted or cancelled.  It will also assign the numbers in the order that the records are entered while employee numbers are often assigned in order of hire.  An AutoNumber used as a primary key only serves as a unique value and to allow for the linking of related tables as you'll see in a moment.

(It is possible to set other non-AutoNumber fields to be unique so that the database will not allow the same value to be entered in the field twice.)

You'll also notice a few extra fields in each table.  Another advantage that comes with grouping information in separate tables is the ability to store more information without getting lost in it as you might with a large spreadsheet.

In the bottom half of the table design screen, you'll see some extra settings for each field.  When adding fields to a table, in particular you should pay attention to settings such as field size, input mask, default value, the validation settings and required.  Proper use of these settings will provide the controls needed to help ensure that data is accurately entered.  I provide more information on these properties in the next chapter and you can get more information on any of them  by placing the cursor in one of the setting boxes and pressing F1.  All I will say at this point is that it does take practice to learn to use them properly so be patient.

Design view for tblPersonnel Design view for tblAddresses Design view for tblAbsences
Design views of tblPersonnel, tblAddresses and tblAbsences. (Click for larger view)

(Access has a few rules concerning field names but is very flexible in what it will accept.  I try to keep my field names relatively short but descriptive enough to avoid confusion.  One rule that I follow is that while Access will allow for spaces in field names, I don't use them.  As your databases become more sophisticated, you may find that they lead to confusion and possible errors.  Also, be careful of using reserved words like "Date" which is an Access function.  Access will allow you to use them but they can cause errors later that are hard to track down.)

By looking at the relationship line between two tables, you can determine the relationship between them and type of join.

Both tblAbsences and tblAddresses also have fields titled EmpKey but the data type is Number instead of AutoNumber.  In terms of the relationships between these tables and tblPersonnel, the EmpKey field in tblAbsences is referred to as the foreign key while it's called the primary key in tblPersonnel. This is because the value originates in tblPersonnel and corresponding records in tblAbsences will relate back to tblPersonnel by having the same value in the foreign key field.  Another feature of tblAbsences is what's called a composite key, or a key that is made up of more than one field.  In this case, you cannot have more than one record with the same combination of employee, date and pay status.

Substituting the EmpKey value for the employee name throughout the other tables not only saves space but eliminates misspellings of names and allows for additional report functions that you will see later.  It also solves the problem of having more than one employee with the same name.  By using the number, the database knows exactly who you're talking about.  

(At this point you may be thinking "Why not just use the SSN for the primary key, since it's unique and you're storing it in another table anyway?"   I cannot stress it enough - NEVER use a value as confidential as a Social Security Number or credit card number as a table key.  In another part of the series, I will talk about how to hide this information from those who should not have access to it.  You cannot prevent users from viewing table keys if they wish, especially if they're referenced by every other table as in this database.)

Back to Top

Types of Relationships

By looking at the relationships between these tables, you can see how these relationships are defined.   Drawing a relationship between two tables or double-clicking on an existing relationship line will bring up the relationship edit screen where you can view the settings. In the relationship between tblPersonnel and tblAbsences, the 1and ∞ symbols above the line indicate a one-to-many relationship between the tables.  This means that for every employee listed in tblPersonnel, there can be many absences recorded.

The relationship between tblPersonnel and tblAddresses is different.  In this case, you'll see a 1 on both sides of the relationship.  There is only one address recorded for every employee and it is called a one-to-one relationship.  The same is true with tblEEOCData, tblNumber and tblPastEmployees.

There are a number of reasons why information may be split off into another table like this.  In this case, it's mainly to reduce the size of tblPersonnel by keeping a number of fields related to the same aspect of employee information (i.e. the contact info) in another table.  The information is still just as accessible by the use of queries and forms which I'll detail in later chapters.  Another reason is for security.  Access to things like Social Security Numbers often needs to be limited to certain personnel.  Putting it into a separate table is the first step. With appropriate assignment of user rights and design of the program interface, you can place a certain degree of security on this and other sensitive information.  Those are also topics for later chapters but when designing a database, it is important to anticipate security needs and other concerns from the very start.

( Click here for more information about Access security )

A third type of relationship - a many-to-many relationship - is possible when relating data.  While it is not shown in this database, an example would be if the database tracked employee training classes.  Each employee might take many classes and there would be many employees per class.  Many-to-many relationships are not directly possible in Access. To show them, it is necessary to create a third intervening table that breaks the relationship down into two one-to-many relationships. 

Back in the relationships window, you'll also notice that the direction of the arrow points to tblAbsences.  If you click on the Join Type button on the Edit Relationships panel, you can see that the relationship is set to first show records from tblPersonnel and then any records in tblAbsences that match those records.  The uses for this will become clearer when you start building queries and forms to analyze and view your data..

Going back to the original spreadsheet, notice the last page of the spreadsheet titled "Departments".  This reference information is brought into the database through the tblDepartments lookup table shown in the relationships window.  If you view the table design, you'll notice that the primary key, DeptNo, is not an AutoNumber field.  I left it as a number field since many companies assign their own numbers to various internal departments. The table is only linked to tblPersonnel.  Again, a lookup field will be provided so that when entering the department information for an employee, the user selects from a list of department names and the department number is substituted in tblPersonnel.

In this case, referencing the department number instead of the name has another advantage. If the company should reorganize, as companies often do, and a department name changes, the name only has to be changed once in the departments lookup table and all of the employees records for that department will reference the new name.  If departments merge or are eliminated, simple functions can be designed to change one number to another in all the records.  

By using its relationship to tblDepartments, the personnel table stores a number corresponding to a given department.

Because of the lookup function added to the Department field of tblPersonnel for this example, the table displays the appropriate title even though only its number is stored.

Back to Top

Referential Integrity

By activating referential integrity on a relationship and choosing the appropriate settings, you can ensure consistent updates and / or deletions of related data.

In the bottom half of the Edit Relationships box, you'll see a section dealing with something called Referential Integrity.  This is a method that Access uses to ensure that data from one table that is supposed to match data in another table is not changed on only one side of the relationship by accident. 

An example of this would be if the number assigned to a department in tblDepartments  was changed or a department was deleted.  If there were employees assigned to that department, this could cause a problem. Without referential integrity, you would be left with employees who were not assigned to an existing department.  It would get even worse if a number of departments were changed at once. With referential integrity activated, Access protects records and primary keys that are referenced by records in other tables.  If you try to delete a record that is referenced by another table, you might get a message like this one:

With referential integrity active and cascade delete off, the database will not allow the deletion of a record that is referenced by the data in another table.

On the other side of the example, it would not be possible for someone to enter a non-existent department number in tblPersonnel because the database would not find a corresponding department in the tblDepartments.

Of course, in some cases, you do need to be able to delete or change records, even if they are referenced in other tables.  The two options under referential integrity for 'cascade' updating and deleting of records address this.  Using cascade update, if a record's primary key is changed, that value is also changed throughout the database in any records that reference it.  If cascade delete is active, then instead of getting the message shown above, you'll get something like this:

If cascade delete is turned on, the database will ask for verification before deleting the record(s).

If you select Yes, then any records in other tables relating to that record would be deleted.  In the case of the relationship between tblPersonnel and tblDepartments, a cascading update would change the department number in the appropriate records in tblPersonnel.  If a department was deleted, a cascading delete would delete any employee records for that department. The options for referential integrity are not automatically appropriate for every situation.  Always take the nature and needs of the data being stored into account when you set options for tables and relationships.

Back to Top

Changing Direction

The skills table (tblSkills) demonstrates another significant difference between common spreadsheet and database design.  In the original spreadsheet, you'll notice the last three columns on the main page list the significant skills for each employee.  Since there are only three columns, the assumption is that the listing is limited to three skills per employee and not much detail is provided. 

Another problem is that while the arrangement allows for some information to be entered on each employee, it doesn't accommodate searches for specific skills.  Just from this small sampling of employees, notice that there is more than one person with an MOS (Microsoft Office Specialist) certification.  If there were a couple hundred employees listed here and you wanted to do a search for everyone with this or other qualifications, you would have to do a manual search on all three columns.  This is not such a big deal in Excel but if this is transferred to an Access table, it's a different story.

Instead of using multiple columns to show the employee skills, tblSkills lists them by row and enables the user to store more detail.

The solution in Access is to actually change the direction of the data.  Instead of listing the skills horizontally, tblSkills lists a record for each instance of a skill along with the appropriate EmpKey value and a field for additional information such as years of experience or type of degree.  In the screenshot above, you can see the employee name substituted for the number by the lookup function along with the skill description and details. In a later chapter, you'll see how this makes it easy to search on a specific skill and pull up a printable list of all qualified employees.

Back to Top

Remaining Tables

The last tables in the database are tblNotes and tblPastEmployees which may be self-explanatory.  The notes table is a generic storage place for any comments and notes that may need to be entered for each employee. Its unique feature is that while it references the EmpKey field in tblPersonnel, it does not use it as part of its own key as no restriction is needed concerning the number of notes entered for a given employee.  The primary key is an AutoNumber field that only serves as a unique identifier.  When I discuss the design of forms, you will see how the data from this table can be displayed in the same form as the rest of the employee data while taking up a minimum of space on the form and still allowing the user to see all available records.

The past employees table is another one-to-one relationship where the data is split off from the main employees table.  In this case it is done to make specific reporting on past employees a little easier.  Another reason is to avoid a lot of blank records in the main table pertaining to information that is only available for a small percentage of people who are no longer with the company. Back to Top

Conclusion

This process of organizing data in Access or any other type of database is called data normalization.  A normalized database follows a set of rules known as Normal Forms.  While I've provided a practical example of this process,  Microsoft's support site also offers a couple of very good articles which detail the rules of normalization and provide examples.  For more information, please refer to the following Microsoft Knowledge Base articles:

283878 - Database Normalization Basics
283698 - ACC2002: "Understanding Relational Database Design"

As I discussed the various tables in this database, you may have found yourself thinking that it looks like a lot more data entry than is shown in the original spreadsheet.  If the tables were actually intended for data entry, that would be true.  This is why I almost never allow users to enter data directly into the tables ' that's what forms are for!  Properly designed forms use the relationships between the tables to bring much of the data entry work together in one place.  They will automatically supply many of the values required in the tables and can also include interesting features like drop-down lists and default values that eliminate a lot of typing and help to ensure correct data entry.

In the next chapter , I'll go into greater detail about how you can create tables like the ones you've seen here.

Back to Top

 

Return to Series Index

 

 

 

  

© 2008, Andrew Comeau, except where otherwise noted. Material contained on this website should not be republished without permission from the author. Questions?  Comments?  You can e-mail me at this address.

Microsoft is a registered trademark of Microsoft Corporation in the United States and other countries.