Parsing – or the Art of Pulling Things
Apart
In most of life putting things together is a whole lot more
difficult then taking things apart. With databases, however, just
the opposite is true. Pulling things apart requires significantly
more effort than putting them together. In last month's issue we
talked about atomicity and how easy it is to put elements together
as long as they are stored in their smallest logical pieces. As you
may recall, I encouraged you to carefully consider atomicity as you
are planning and designing your databases, and I also showed you how
to put elements together as needed.
This month we'll tackle a problem that I frequently see - the
case where multiple data elements are stored in a single field. To
get them apart we'll use parsing.
The Building Blocks of Parsing |
|
Let's say you have a customer database and for whatever
reason, the first and last names of your customers are stored
in a single field called FullName, as shown below:
What do you do if you want only the last name of each
customer, or the first and last name in a different order (as
would be required if you were writing a form letter to your
customers)? The answer is parsing, or taking data apart. You
can write a query to isolate, or parse, the data elements you
need.
Let's imagine that we want to write a form letter to our
customers. We'll need to parse the field FullName in our
database to extract their first and last names to change the
order. Our plan of attack has three basic steps:
|
Step 1: Locating the Lever |
|
First, determine how you can separate the pieces of data.
In our example, we have a comma between the last name and
first name, so we can use the comma as a lever to separate the
two pieces. Other characters, like semi-colons, slashes, or
dashes will also work, as long as they are used consistently
in a given field. Spaces can work too, but they're a little
tougher to work with since they may appear multiple times.
In this first step, we'll write a query using an Access
function called "InStr" to determine where the comma falls in
the string. InStr, which can be used to locate a character or
a fixed string of characters, will give us a number that tells
us the character location of our lever (in this case, a comma)
in our FullName string containing [LastName], [FirstName].
Here is an example of a query using the function InStr to
determine the position of the comma in the field FullName:
To use the function InStr, we have three arguments within
the parentheses: first is which character we should start
searching from (in our case the very first character - 1;
second, the name of the field (FullName); and third, the
character (or string of characters) we want to locate, in
quotes. These arguments are separated by commas.
When you run this query, you get the following results:
|
Step 2: Isolating the Last Name |
|
Now that we know where the comma occurs in the string, the
second step is to run a query to extract everything to the
left of the comma and put it in a separate field labeled Last
Name. We do this by utilizing a function called "Left". The
Left function returns all the characters to the left of the
position of the lever (in our case, the comma) that we
identified using the InStr function in step one.
So we create a query as follows:
To use the function Left, we have two arguments within the
parentheses: the first argument is the string we are starting
with, and the second argument is the number of characters to
return. This piece is just a tad tricky since it changes from
record to record, that's why we're using the InStr function
(from Step one) as our second argument.
As you can see the InStr function is nested as an argument
within the Left function in this query. The innermost function
InStr returns a number that determines the position of the
comma in the string of characters that make up [LastName],
[FirstName] (our FullName field). Then, the function Left
extracts all the characters to the left of the comma. Notice
the -1 at the end of the line? This is actually a mathematical
calculation that tells Access that we don't want the comma-we
want only the characters starting one to the left of the comma
and continuing leftward to the beginning of the string. These
characters will be reported in a column labeled Last Name.
When we run this we finally get what we want:
|
Step 3: Isolating the First Name |
|
Although the process of isolating the first name is similar
to isolating the last name, it is slightly more complicated.
At the onset, let me warn that this step is actually more
complicated to explain then it is to do. This is due to the
fact that we have to determine the endpoint of the string to
the right of the comma (our lever and reference point) to
isolate the first name (theoretically, the string could be
infinitely long). Stay with me, because once you understand
the principle behind this technique, you'll be able to
experiment to get the results you want.
To parse the first name, you'll use a routine similar to
isolating the last name, except this time you'd use the
"Right" function along with another function called "Len". The
Len function takes a single argument -- the string you want to
know the length of. In our case we use Len(FullName). For both
"Hawke, Thomas" and "Litter, Kitty" that function returns 13,
for "Nasium, Jim" we get 11. Keep in mind that things like
commas, spaces and periods are characters, even though in some
cases (like spaces) we can't see them. We'll use the Len
function and a mathematical calculation to isolate the first
name string.
Here is the query, which shows three nested functions
(InStr, Len, and Right):
To use the function Right, we have two arguments within the
parentheses: the first argument is the string we are starting
with (FullName), and the second argument indicates the number
of characters we want to return.
Again, the tricky part of this step is that we have to tell
Access a number that depends on the string we start with. When
we used the Left function, the query returned all the
characters from the comma to the left, all the way back to the
beginning of the string. In this case, our string to the right
of the comma could be infinitely long. That is why it is
necessary to do some math in the query.
When you run this you get:
OK, so let's pull this one apart to make sure it's clear:
We use the InStr function to determine where the
comma lies in the string FullName. Remember, this function
returns a number.
We use the function Len to give us a number that
tells us the length of the string FullName.
We tell Access that we want the rightmost
characters from the string from one space after the comma.
As you can see in the query, we are doing a mathematical
calculation to get the number of characters we want:
[the total number of characters] - [the length of the
string up to the comma] - 1 (to get rid of the space between
the comma and the first name).
This gives us the
number of characters in the first name.
At this point you have parsed out the First Name and the
Last Name into separate fields. Writing the greeting in your
form letter is as simple as joining (or concatenating) them
together in the correct order as we talked about in the
November Wizard.
You can imagine that if you had a middle initial or middle
name as well, life would become even more complex. In that
case you could use similar techniques and focus on the string
after you've stripped out the first name. Once it gets to that
level of complexity though, it's much easier to do it in
Visual Basic for Applications (VBA) code. VBA is considerably
easier to troubleshoot, and it allows you to visualize more
clearly what's going on. Check out your local bookstore or
library if you are interested in learning more about VBA. They
may have some books to get you started.
|
Conclusion |
|
As you can see, parsing involves considerably more work
than the joining (or concatenating) we did last month. This is
a factor which should motivate you take the time to analyze
(and atomize) your data in the beginning, and set up your
tables and fields to avoid this problem altogether.
However, when you are faced with more than one piece of
data meshed into a single field, you have the tools of parsing
available to you to help pull them apart.
|
Trap of the Month |
|
Have you ever noticed that sometimes when you open a form
you get the property box? I personally find this behavior of
Access really annoying. What's going on is that because you
had the property box open when you last saved the form in
design view, you get it when you run the form. To make this
problem go away, follow these steps:
Open the form in design view
Select the form by clicking the grey box in the
upper left hand corner (you should already have the properties
box visible, but if not choose View, Properties)
Click on the tab marked Other
In "Allow Design Changes" at the bottom of the
list, choose "Design View Only"
|
|
Tip of the Month - Recover from those Oh-Oh
Moments! |
|
Have you been working in Access and had one of those
"Oh-Oh" moments? You just accidentally hit the backspace key
and deleted something, or changed something that you really
didn't want to change. In many cases you can recover using
"undo". Simply hit Ctrl-Z or choose "undo" from the Edit menu.
It won't work in every single case (sometimes what you do is
permanent), but if you run into a situation where you wish you
could take back those last keystrokes, give "undo" a try and
see if you can easily recover. |
|