Jump to content
Salesforce and other SMB Solutions are coming soon. ×

Parsing Address field from excel upload??


pasternik
 Share

Recommended Posts

Hi! Its me again!!

 

OK, I have some new issues I need to work on and fast!!

 

I have attached an Excel 2003 worksheet as an example for part of my problem. I have Indexed spreadsheets that contain information about the fractional interests of Mineral Owners. They start from the year 1905 to present. On the spreadshhet there are Grantors and Grantees, and Legal Descriptions, etc. One Grantor may be listed over time, 20 different times, his address may change during that period, and we track addresses starting from 1995 to present. A Grantor may own a fractional interest in Minerals in a certain Legal or in multiple Legals. He may give some of his fractional interest or ALL, to a Grantee (a Bank, another Person, an Hier, etc).

 

Should I set up separate tables for the Grantors and Grantees?

Or table occurances?

 

Should I give a separate serial # to each Grantor and Grantee?

 

Should I make a separate table for the Legals? (Legals are not the same acrross the board, so there is no way to make a global field. Legals are arranged by "Section, Township, and Range" also! IE: Sec 30-2N-12E may have 15-30 diffrent Legal Descriptions per).

 

I would also like to parse the address, and SS# out of the Grantors and Grantees fields, AFTER 1995.

 

Can anyone help?

Jenni

 

P.S. None of the info provided in the example is correct as far as addresses and Socials, so please noboby get any ideas. I do work with personal information, but it is always kept safe and secure.

Link to comment
Share on other sites

Hmmm, seems you have a lot of work on your hands!

 

As far as I can see, the spreadsheet shows granting transactions between grantor and grantee. So Transactions is one entity (table).

 

In answer to one of your questions, I see no reason to have separate tables for grantors and grantees. They should be in one table and assigned serial numbers for identification in the transactions.

 

The parsing process I would do as a separate exercise. You need to extract the information and insert the serial numbers in each transaction. Then you need to identify duplicates and merge them while keeping different addresses through time.

 

Simple? As I said, it is a lot of work that needs to be done carefully.

 

Should you make a separate table for the Legals? Can't answer that because I don't understand the purpose of the data.

 

As for your PS - I thought "SS # 000-00-0000" looked real! And I bet their phone numbers all start with 555!

 

Good luck!

Link to comment
Share on other sites

As an example, for those who do not want to look at the attachment..

 

This is what needs to be parsed..

 

 

Rufus H. Ott, a married man dealing with his sole and separate property

102 E. 7th Ave

Escondido, CA 92025

SS # 000-00-0000

 

 

Grantor = Rufus H. Ott, a married man dealing with his sole and separate property

 

Address = 102 E. 7th Ave

 

City = Escondido

 

State = CA

 

Zip = 92025

 

SS # = 000-00-0000

 

These are all separate fields in the database I am trying build, Rightwords, and Leftwords are just not going to work in this situation... Can anyone help?

 

Thank you for reading!! smiley-laughing

 

Jenni

Link to comment
Share on other sites

Ummm, I don't know how you read my post but there was no sarcasm intended. If it was about the SS# comment, "that was a joke Joyce" - but may have been lost in the translation from Australian. Not intended to be sarcastic or condescending. Apologies if it was.

 

I did give you advice on table structure but the parsing exercise is not a simple job. As for a how-to, this might get you started:

 

data (text) - contains your data to be parsed
grantor = GetValue ( data ; 1 )
person = Case(PatternCount ( GetValue ( data ; 1 ) ; "," ); Left ( data ; Position ( data ; "," ; 1 ; 1 ) - 1 ); GetValue ( data ; 1 ) )
address = GetValue ( data ; 2 )
city = Left ( GetValue ( data ; 3 ) ; Position ( GetValue ( data ; 3 ) ; "," ; 1 ; 1 ) - 1 )
state = Left ( Middle ( GetValue ( data ; 3 ) ; Position ( GetValue ( data ; 3 ) ; ", " ; 1 ; 1 ) + 2 ; 999 ) ; 2 )
zip = RightWords ( GetValue ( data ; 3 ) ; 1 )
ss1 = Middle( data; Position ( data ; "SS # " ; 1 ; 1 ) + 5; 11)

 

And for extra credit to account for the examples in the spreadsheet:

 

ss2 = Case( PatternCount ( data ; "SS # " ) > 1; Middle( data; Position ( data ; "SS # " ; 1 ; 2 ) + 5; 11) )
person_aka = Case( PatternCount ( GetValue ( data ; 1 ) ; "aka" ); 
Middle( GetValue ( data ; 1 ); 
       Position ( GetValue ( data ; 1 ) ; " aka " ; 1 ; 1 ) + 5;  
       Position ( GetValue ( data ; 1 ) ; "," ; 1 ; 2 ) - Position ( GetValue ( data ; 1 ) ; " aka " ; 1 ; 1 ) - 5 )  )

 

These extra calculations extract the person's aka if there is one, and the second SS# if there is one.

 

Assumptions made about the data include:

1. there are returns in the field between lines i.e. name address city, state, zip SS#. This makes the data effectively a value list.

2. there is a comma following the person's name and if there is not, the whole first value is the name

3. there is a second comma following the person's aka if present

4. there is a comma following the city name

5. state abbreviations are two letters

6. SS# is always written as "SS # 000-00-0000"

 

See how you go with that. In most parsing exercises, you can hope for about 95% success rate with manual checking required to catch exceptions.

 

I can post my example file if that would help. If any of the calculations need explanation, I can do that too.

 

Have fun!

Link to comment
Share on other sites

Wow David!!!

 

I haven't had the chance to implement your code yet, but I want to THANK YOU so much for your help. I'm sorry for my earlier post. Its been a long frustrating day trying to write code that was never going to work.

As far as your questions go:

 

1. there are returns in the field between lines i.e. name address city, state, zip SS#. This makes the data effectively a value list. YES there are returns between each

 

2. there is a comma following the person's name and if there is not, the whole first value is the name WILL CHECK, in most cases YES

 

3. there is a second comma following the person's aka if present YES, in almost every case

 

4. there is a comma following the city name YES

 

5. state abbreviations are two letters UNFORTUNATELY not in every case

 

6. SS# is always written as "SS # 000-00-0000" IF its not the will delete "SS #" in the Calc. and leave it as shown

 

I would love to see your example file, if you don't mind, this code must have taken you hours to come up with, unless of course your just that good, which I don't doubt in any way! BTW, your website is awesome!! I may have more questions about the Calc's but for now, until i do some testing I'm going to use your exact code here and see how it goes.

 

As for the Legal descriptions in my earlier posts. A Legal Description is a break down of Longitude and Latitude, in the U.S., (or at least here in Oklahoma). Each "Township" and "Range" has 36 "Sections". Each Section contains 640 Acres, which are then broke down into Tracts, every Tract has a Legal Description of where to pinpoint it on a map. Just about every tract is of a different size, anywhere from .5 acres to the whole 640 acres (which is rare). Do you think these should be in a separate table such as:

 

legal_id

section

township

range

legal

 

Every Grantor/Grantee is attached to one, or more, specifc Legal, Sec., Twnshp, and Rng. My thinking is it needs to be on the original table because the Legals are so different in most cases.

 

You are an unbelievable help, David, sorry for my pissy mood earlier.

 

Thanks so much!!

 

Jenni

Link to comment
Share on other sites

David,

 

I have had a chance to use your parsing solution, and it works great for alot of the Data I am working with, but I'd like to see if you can help me make a couple of changes.

 

I made a copy of the Person field, and changed it to this:

 

Case(PatternCount ( GetValue ( data ; 1 ) ; "¶" ); Left ( data ; Position ( data ; "," ; 1 ; 1 ) - 1 ); GetValue ( data ; 1 ) )

 

This grabs all of the data, including any aka's, up to the return before the address. That works great!!

 

Now on the State field, I would like to see if we can look for the comma after the City and grab what ever is in between the spaces, instead of a just a 2 letter State.

 

Example:

 

Edmond, Whatever 74857 or Edmond, WR 74857

 

The SS # works great for all Socials that are formatted exactly that way, but as I was looking thru some of the data that was parsed, I have found different formatting throughout. Such as "SS #" "SS#" "SSN" "Tax Id" "Tax Id#" and "Tax Id #", the formatting for those is all over the place. So I was thinking, all of these have at least the number format the same way.

 

Example:

 

SSN "000-00-0000" and Tax id "00-0000000"

 

Is it possible to do a patterncount to look for these specific formats instead, grab any data behind it and look for the return? Because some of them are followed by initials, in brackets, of the person it belongs to. This might cause a problem if there is no return after the SSN, BUT there is always a return before a second SSN, if there is one......

 

Example:

 

SSN 000-00-0000(JKL)

SSN 000-00-0000(JRD)

 

I am so excited that this is working as well as it is, even with a few gliches, you are *GOLDEN* David!!

 

Hope to here from ya soon,

 

Jenni

Link to comment
Share on other sites

I made a copy of the Person field, and changed it to this:

 

Case(PatternCount ( GetValue ( data ; 1 ) ; "¶" ); Left ( data ; Position ( data ; "," ; 1 ; 1 ) - 1 ); GetValue ( data ; 1 ) )

 

This grabs all of the data, including any aka's, up to the return before the address. That works great!!

 

Actually, all you need for that is:

 

GetValue ( data ; 1 )

 

This gets the first "value" - everything up to the first return character.

It is exactly what your calculation returns every time because the test:

 

PatternCount ( GetValue ( data ; 1 ) ; "¶" )

 

is always false - there is never a return in a value. :cool:

 

Now on the State field, I would like to see if we can look for the comma after the City and grab what ever is in between the spaces, instead of a just a 2 letter State.

 

So what you need is to get all the words after the comma except for the last word. The calculation got a bit nasty so I set it up in a Let statement to better document it (could do that with others too):

 

Let ( [ 
citystatezip = GetValue ( data ; 3 ) ; 
commapos = Position ( citystatezip ; ", " ; 1 ; 1 ) ;
statezip = Right( citystatezip; Length( citystatezip ) - commapos - 1 ) ;
statezipwords = WordCount ( statezip )
] ; 
LeftWords ( statezip ; statezipwords - 1 )
)

 

The SS # works great for all Socials that are formatted exactly that way, but as I was looking thru some of the data that was parsed, I have found different formatting throughout. Such as "SS #" "SS#" "SSN" "Tax Id" "Tax Id#" and "Tax Id #", the formatting for those is all over the place.

 

My new method will extract the values with the SS numbers, extract the digits from those values, and then reform them into the correct formatting. So the first SSN is extracted using:

 

Let ( [
ssntext = GetValue ( data ; 4 ) & GetValue ( data ; 5 ) ; 
ssnumbers = Filter ( ssntext ; "0123456789" );
ssncount = Length(ssnumbers) / 9
] ;
Case ( ssncount ; Left ( ssnumbers ; 3 ) & "-" & Middle ( ssnumbers ; 4 ; 2 ) & "-" & Middle ( ssnumbers ; 6 ; 4 ) ) 
)

 

And the second SSN (where it exists) is extracted using:

 

Let ( [
ssntext = GetValue ( data ; 4 ) & GetValue ( data ; 5 ) ; 
ssnumbers = Filter ( ssntext ; "0123456789" );
ssncount = Length(ssnumbers) / 9
] ;
Case ( ssncount > 1 ; Middle ( ssnumbers ; 10; 3 ) & "-" & Middle ( ssnumbers ; 13 ; 2 ) & "-" & Middle ( ssnumbers ; 15 ; 4 ) ) 
)

 

So that's about it for parsing that baby!!! Modified file attached.

 

Nice to see someone excited about text parsing ;) Geeks rule!!!

Link to comment
Share on other sites

You are a genius David!!!!

 

I would kiss you right now if you weren't in Austrailia!!! You have saved me countless hours, days, and maybe even a month's worth of trying to figure that out on my own.

 

Everything works perfectly, but can I ask one last favor? I mentioned the Tax Id # in the last post it is formatted as 00-0000000, a Tax Id is for a business and a SSN is for a person. I was wondering if there is any way your ss1alt & ss2alt can test for the existing pattern first, before it rewrites as 000-00-0000? I am not complaining, I am absolutely amazed that you have the abilty to write such a complicated code, and if its too much to ask, then thats ok, I'll leave it as is. Cheers! Yeah

 

Thank you again for all your help, I couldn't have done it with out you, at all!!

 

Jenni

Link to comment
Share on other sites

You are a genius David!!!!

 

In the words of the King "Thank you very much"!!!

 

I would kiss you right now if you weren't in Austrailia!!!

 

Woohoo! Yeah Yeah Yeah Maybe I need to hop on a plane ;) But what would my wife think?!

 

You have saved me countless hours, days, and maybe even a month's worth of trying to figure that out on my own.

 

Don't you just love that?! I suspect your Café subscription fee has paid for itself?

 

Everything works perfectly, but can I ask one last favor? I mentioned the Tax Id # in the last post it is formatted as 00-0000000, a Tax Id is for a business and a SSN is for a person. I was wondering if there is any way your ss1alt & ss2alt can test for the existing pattern first, before it rewrites as 000-00-0000?

 

OK, let's test for the existence of the string "tax" and format differently if it exists:

 

ss1alt =
Let ( [
ssntext = GetValue ( data ; 4 ) & GetValue ( data ; 5 ) ; 
ssnumbers = Filter ( ssntext ; "0123456789" );
ssncount = Length(ssnumbers) / 9;
type = Case( PatternCount ( ssntext ; "tax" ); 1; PatternCount ( ssntext ; "ss" ); 2; 0) // 1 = tax number, 2 = SS number
] ;
Case ( ssncount ; 
Case (type = 1; Left ( ssnumbers ; 2 ) & "-" & Middle ( ssnumbers ; 3 ; 7 );
type = 2; Left ( ssnumbers ; 3 ) & "-" & Middle ( ssnumbers ; 4 ; 2 ) & "-" & Middle ( ssnumbers ; 6 ; 4 ))
) 
)

ss2alt =
Let ( [
ssntext = GetValue ( data ; 4 ) & GetValue ( data ; 5 ) ; 
ssnumbers = Filter ( ssntext ; "0123456789" );
ssncount = Length(ssnumbers) / 9;
type = Case( PatternCount ( ssntext ; "tax" ); 1; PatternCount ( ssntext ; "ss" ); 2; 0) // 1 = tax number, 2 = SS number
] ;
Case ( ssncount > 1; 
Case (type = 1; Middle ( ssnumbers ; 10; 2 ) & "-" & Middle ( ssnumbers ; 12 ; 7 );
type = 2; Middle ( ssnumbers ; 10; 3 ) & "-" & Middle ( ssnumbers ; 13 ; 2 ) & "-" & Middle ( ssnumbers ; 15 ; 4 ))
) 
)

 

So there you go - your final wish is granted!

 

Thank you again for all your help, I couldn't have done it with out you, at all!!

 

No problems. Glad to help :cool:

Link to comment
Share on other sites

Unreal!!! I can't believe this, everything is perfectly matching up!!

 

Geeks do rule!!! Rolling Grin!

 

 

Woohoo! Yeah Maybe I need to hop on a plane But what would my wife think?!:

Oops!! Opps! Tell her I said Hi!!

 

 

 

Don't you just love that?! I suspect your Café subscription fee has paid for itself?:

David is the God Angel with Grin! of Code!!

 

 

 

 

So there you go - your final wish is granted!:

Thank you master!!!! Yeah Yeah Yeah

 

Jenni

Link to comment
Share on other sites

This thread is quite old. Please start a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share



×
×
  • Create New...

Important Information

Terms of Use