Jump to content
Sign in to follow this  
FrereGenetics

Digesting specific codes from txt files

Recommended Posts

Maarten Witberg
eliminate any user prep whatsoever and have FMP do everything for them

OK you got that covered now? I feel like I am running to keep up with you. With a host of loose questions with me or another helpful here groping around in the dark as to your system setup, not knowing the context of use (you or a user group), the function of the database, etc, be aware of the genie in a lamp syndrome: they give you precisely what you ask for, but this may very well be not what you want.

Share this post


Link to post
Share on other sites
FrereGenetics

haha no you have been invaluably helpful. the speed at which we can process this stuff has been sped up to an unthought of speed.

 

My final question of the week:

 

Now that i have a series of extracted strings in one database is there any way to match each string to any part of any string in the second database (the catalog) and have it return the page number of that string in the catalog (or just a field that contains a serial value instead of the page number, this would be more helpful) aaaaaand it could appear in multiple pages so is there a way to have it return every match

 

ex:

search string "CATAG"

Catalog page 1: "CTTCAG'CATAG'GCAAC" match

Catalog page 2: "CCTATTGTAACAACCCC" no match

Catalog page 3: "CCT'CATAG'GGACCTGAT" match

 

I want to be able to add to the original page for CATAG that it matches both records 1+3 and however many more it matches

Share this post


Link to post
Share on other sites
Maarten Witberg
I want to be able to add to the original page for CATAG that it matches both records 1+3 and however many more it matches

Search string can be any string, any length, not just CATAG? How long is the catalog page string?

If you want to do this relationally, as far as I know you really need a custom function. no way to do it reliably with a normal calculation.

You could do it scripted I guess using my original calc, but that would take ages to perform, with the millions of records you're handling.

 

the trick is to parse out the catalog string

CTTCAG'CATAG'GCAAC

C

CT

CTT

CTTC

...

T

TT

TTC

TTCA

...

...

G

GC

GCA

GCAA

GCAAC

 

and match that as a multikey field to the parsed string field in the data table. a custom function will do this work for you. check briandunning.com, perhaps there's one already in the list that does that (not explodedkey, that just takes strings from the first letter, not the second letter too).

 

or just a field that contains a serial value instead of the page number, this would be more helpful

use an auto-enter serial field upon importing the raw data.

 

 

that's all from me tonight, be back tomorrow.

Share this post


Link to post
Share on other sites
FrereGenetics

Yes each search string varies in length and there are thousands of strings.

In addition there are thousands of strings to search from that may include multiple matches.

 

Do custom functions work on FMP 7?

 

How do you report multiple matches in the search-for string's record page?

Share this post


Link to post
Share on other sites
Maarten Witberg

yes, custom functions work in v7, but you need the developer edition to be able to define them (for use, ordinary v7 pro is OK). If you upgrade, you're probably going straight to ver. 9 advanced unless you can get a copy from amazon or something. Or find someone who will perform the service of pasting the cf in your solution so you can access it.

 

Multiple matches: in a relational database, I'd use a portal for on screen viewing. Printed, you make a list view with the parent data in the header.

 

If you post say a dozen strings from the catalog, I can modify the sample I posted above. I do have to ask you to upgrade your membership so you can download samples from the site - this helps keep the site online and saves space on my ftp site (ya like 100k is going to make a difference).

Share this post


Link to post
Share on other sites
FrereGenetics

Some catalog entries:

 TTCACTGTGGGATGAGGTAGTAGGTTGTATAGTTTTAGGGTCACACCCACCACTGGGAGATAACTATACAATCTACTGTCTTTCCTAAGGTGAT

CTGCATGTTCCCAGGTTGAGGTAGTAGGTTGTATAGTTTAGAGTTACATCAAGGGAGATAACTGTACAGCCTCCTAGCTTTCCTTGGGACTTGCAC

GCAGGGTGAGGTAGTAGGTTGTGTGGTTTCAGGGCAGTGATGTTGCCCCTCCGAAGATAACTATACAACCTACTGCCTTCCCTGA

TGTGTGCATCCGGGTTGAGGTAGTAGGTTGTATGGTTTAGAGTTACACCCTGGGAGTTAACTGTACAACCTTCTAGCTTTCCTTGGAGCACACT

ACGGCCTTTGGGGTGAGGTAGTAGGTTGTATGGTTTTGGGCTCTGCCCCGCTCTGCGGTAACTATACAATCTACTGTCTTTCCTGAAGTGGCCGC

AATGGGTTCCTAGGAAGAGGTAGTAGGTTGCATAGTTTTAGGGCAGAGATTTTGCCCACAAGGAGTTAACTATACGACCTGCTGCCTTTCTTAGGGCCTTATT

ATCAGAGTGAGGTAGTAGATTGTATAGTTGTGGGGTAGTGATTTTACCCTGTTTAGGAGATAACTATACAATCTATTGCCTTCCCTGAG

TGTGGGATGAGGTAGTAGATTGTATAGTTTTAGGGTCATACCCCATCTTGGAGATAACTATACAGTCTACTGTCTTTCCCACG

CCAGGCTGAGGTAGTAGTTTGTACAGTTTGAGGGTCTATGATACCACCCGGTACAGGAGATAACTGTACAGGCCACTGCCTTGCCAGG

CTGGCTGAGGTAGTAGTTTGTGCTGTTGGTCGGGTTGTGACATTGCCCGCTGTGGAGATAACTGCGCAAGCTACTGCCTTGCTAG

 

Theres around a thousand of these and the strings we need to match into these number in the 100,000s. so say i needed to find "TGTACAGTT" in those sequences is this something that can be done sans custom function. I'm worried this is just too specific for a premade custom function.

 

ps. wheres the info on upgrading membership? y'all have been more than helpful

Share this post


Link to post
Share on other sites
Maarten Witberg
I'm worried this is just too specific for a premade custom function

No, I don't think so (i had been precooking this).

 

A custom function is "custom" in the sense that the developer can create his own functions. As long as a CF does not have any hardcoded values it should be able to handle any data just like a normal, prebuilt function. And I think that while it may be possible to do this without a cf, your solution will become unwieldy and slow and difficult to maintain.

 

For upgrading membership: check "paid subscriptions" in your user control panel (left column). To be clear on the matter: your subscription will help the site; but all helpfuls ("mentors") and moderators are volunteers. Still, thanks for upgrading.

 

PS I'll make a sample tonight.

Share this post


Link to post
Share on other sites
FrereGenetics

so i assume this would go into my find criteria....?

 

In other words, how do i search for my string in the exploded key outputs?

Share this post


Link to post
Share on other sites
Maarten Witberg

no, you use ParsedString in the first table to match against the exploded key in the catalog table. You can do a find just by typing *catag* into the catalog, you can even script this, and make do without the exploded key, but to do this with 100.000 records all with different strings to match... oof. That's why I jumped to the relational solution. just let me do the sample later on, you'll see what I'm getting at.

Share this post


Link to post
Share on other sites
FrereGenetics

when i type catag into the catalog it doesnt return anything because the catag is inside the strings im searching through. its not the entire string

Share this post


Link to post
Share on other sites
Maarten Witberg

no, use the wildcard * (signifies zero or more characters).

Share this post


Link to post
Share on other sites
Maarten Witberg

http://home.planet.nl/~witbe001/ParseSequencesSortCat.fp7.zip

 

here's an extended sample with the matching functionality for catalog items installed.

 

You can either show / hide a portal that shows the matching records or click on the "catalog match" number to get a popup with color coded the sequence match in the catalog item.

 

Of course, the interface must be adapted to what you want to do with all this.

 

Let me know how it performs in a large file....

Share this post


Link to post
Share on other sites
FrereGenetics

wow.

 

this is awesome. This field (AllMatchesID) is missing a function though

 

Substitute ( (Catalog::Display) ; InterestingSequence ; TextColor ( InterestingSequence ; RGB (250 ; 0 ; 200 ) ) )

Share this post


Link to post
Share on other sites
FrereGenetics

and what does the permutations button do?

Share this post


Link to post
Share on other sites
FrereGenetics

not button field what does the permutations field and number of permutations do?

Share this post


Link to post
Share on other sites
Maarten Witberg

Oops, my bad. That is the List ( ) function, it's present as of version 8. Didn't think about that. Anyway you got the portal. I'll try and make a ver. 7 workaround for it later on.

Share this post


Link to post
Share on other sites
Maarten Witberg

number of permutations simply returns the number of values in the exploded key. You see how the name of the function aptly describes what happens here :)

Share this post


Link to post
Share on other sites
FrereGenetics

its perfect but im sorry im still confused about the sequence exploded permutations. only one letter appears in sequenceexplodedpermutations and there are values in the thousands in number

Share this post


Link to post
Share on other sites
FrereGenetics

and how come everrytime i move the catalog window or its scroll bar it has to redisplay every value (pretty slowly too) i'm worried about its speed when i have thousands of catalog entries

Share this post


Link to post
Share on other sites
Maarten Witberg

ah, that field is fitted with a scroll bar. go to form or list view to get an idea...

Share this post


Link to post
Share on other sites
Maarten Witberg

I don't have a speed issue :confused:

Share this post


Link to post
Share on other sites
FrereGenetics

i tried to export my records from my gene catalog and use import to enter it into yours and it took ten minutes to do just 400. i'll probably have to do a couple thousand at least. I'm guessing thats just because its filtering everything as it goes, ie doing tons of calcs wihile importing. the only thing im worried about is how long it takes to scroll but again that is probably be a calc thing. do you think its a mac character set issue?

Share this post


Link to post
Share on other sites
Maarten Witberg
I'm guessing thats just because its filtering everything as it goes, ie doing tons of calcs wihile importing.

 

yup, that would be it. I have not given much thought to optimizing other than trying to use as much stored calcs as possible.

 

do you think its a mac character set issue?

don't think so.

 

Please let me know how it performs after the import. It should be faster.

Share this post


Link to post
Share on other sites
FrereGenetics

and also im still curious what the sequence exploded permutations is because its a huge space with only one letter in each line

Share this post


Link to post
Share on other sites
Maarten Witberg

see post #51

Share this post


Link to post
Share on other sites
FrereGenetics

excellent sorry. blind.

I imported my sequences into the strings that need to be filtered parsesequences::sequence. they are all there but only the first one was filtered and matched up

Share this post


Link to post
Share on other sites
FrereGenetics

is it because i had to change the serial number from an actual serial number to the serial number we use because i can change that back and do it a different way

Share this post


Link to post
Share on other sites
FrereGenetics

yea it was my bad answered my own question

Share this post


Link to post
Share on other sites
Maarten Witberg

no, the serial has nothing to do with it. The limiting sequence is empty. I didn't mention that I made it into an auto-enter that looks up its value from the utility table.

You can go to any record in the sequences table, paste in the limiting sequence, and while still in the field, select replace field contents from the re records menu. prepare for another couple minutes computing time...

Share this post


Link to post
Share on other sites
This thread is quite old. Please start a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  



×
×
  • Create New...

Important Information

Terms of Use