Jump to content
Salesforce and other SMB Solutions are coming soon. ×

Large import from excel with unique id questions...


bdpla
 Share

Recommended Posts

I'm fairly new to FMP and I am running into a couple problems setting up my new solution. I've been working through the VTC tutorials and I guess I am not far enough yet.

 

I have a large amount of scraped data broken down into 3 categories - companies, contacts and projects essentially. I had a unique ID generated during the scrape for each project, but not for the contacts of the companies. Unfortunately, Im dealing with a huge number of duplicate companies and contacts dues to the raw dump of data. I was planning on validating the data during import, generating a serial number for both the contacts and the companies so I could use the projects table as a join table and a many to many setup. I couldn't figure out how to merge the duplicates during import by validation and to save the project id reference from the deleted company record to the merged company record.

 

Any direction would be greatly appreciated.

 

Bryan

Link to comment
Share on other sites

how to merge the duplicates during import by validation

Not sure if that's even possible automatically.

 

Importing the duplicate data and then removing the duplicates is possible however using a script.

If you have a big flat file import of an excel sheet, then I think I'd create an import table that holds all values.

then:

- prewash the fields using the TrimAll() function to remove extraneous spaces

- create a temp field concatenating for each table type (contacts = first name + last name + company name for instance) to identify unique records

- sort by that field

- You should consider manually checking your data because in the raw data, typos may be abundant and prevent you from recognising two contacts as identical (f.i. John Brown and John Brwon).

- Then loop through the records, comparing the temp field with the previous field and setting the serial ID for the target table +1 if necessary

- Repeat this for the other target tables

- assuming all ecxel rows are unique projects, import the temp table as a whole into the projects table including the newly created serial IDs as foreign keys for contacts and companies (only the relevant fields).

- then import same into contact table and company table (only the relevant fields).

- then run a 'remove duplicates' script over Contacts and Companies tables based on serial ID. I think there are several examples of such scripts on this site.

- remember to import the Company ID into the Contacts table also.

 

If you need more help, please post again.

Link to comment
Share on other sites

Maarten,

 

Thanks for the response. I feared that was the case. I will attempt to work through your steps although Ill be honest, I'm not as versed in FMP as I'd like to be so it will take me some time.

 

What shall I do about the project ID's that are attached to the duplicate entries? A project may have 1-5 companies associated with it but some of the companies have close to 150 projects in progress. ABC TV for instance has only one mailing address but 134 projects so I need to make sure the project associations still exist after I merge the contact entries. Do repeating fields work for this? Should I import a full table with all the project id's and duplicate companies to act as a reference table?

 

Here is what I'm referring to:

https://www.box.com/s/ab24c4294afc62675469

Link to comment
Share on other sites

What shall I do about the project ID's that are attached to the duplicate entries? A project may have 1-5 companies associated with it but some of the companies have close to 150 projects in progress.

 

Oh I misunderstood that. Excuse me for reading over the reference to a many-to-many setup you made in the first post. But your join table is not going to be the projects table. There should be a separate join table holding company IDs and project IDs. So you would have to do the procedure of finding the duplicate projects also, and import stuff into the join table also before removing the duplicate projects.

 

Hope that helps. And I am assuming contacts are linked to companies and not directly to projects? Or is that necessary also? f.i. if different contacts of the same company are related to different projects? Then you need another join table for that I guess.

Link to comment
Share on other sites

This thread is quite old. Please start a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share



×
×
  • Create New...

Important Information

Terms of Use