Maarten Witberg Posted May 10, 2005 Share Posted May 10, 2005 I am trying to set up a cross-referencing feature in a knowledge database. The database consists of single text field records (actually there are a lot of overhead fields but never mind). The text fields typically contain about 100-200 words of text which consists of fully grammatical sentences, not lists of words or something. What I want to achieve is to show a cross-reference list of words the author of the text field has put in an index (not FM's own indexing feature!). This is what I've done so far. a script loops through the text field word by word, presenting words to the author. The author decides to put the word in the index or in an unwanted words list (obviously the majority of words will end up there). The script is self learning in a way: words that are already in the index or the unwanted list are skipped so the more records the author indexes, the faster it should theoretically go. You could start out with a basic words list like "the" , "a", "is", "there" etc. to make the start quicker. By making a self join between a menu field and the index items field, you can now quickly filter out the cross references. so far so hoopy, but I rather think I am not out of the woods yet and I need some help. 1) the self join means that all items are self-referential. So you would always be pointed to the record you're in. this is redundant information. Is there a way to not show the current record in the portal? 2) the unwanted words list ( a global field) may become quite long, although I expect it to grow fast at the start and then slow down as many words will occur multiple times. A field may contain 64000 characters so I estimate something like 7.000 to 9.000 words will fit. This looks like a lot but where to go if I should hit this ceiling? version 7 finally? (now there's an excuse!) 3) in a test run I did I selected between ten and twenty indexable words per record. Only a few of them may actually have a reference elswhere in the file. A user having to try out every indexed word for hits I don't see being very happy for long. So I see two ways. Ideally, there would be a way to pick out the indexed words that occur in other records and highlight them in some way - and this would have to be an updateable feature because records may be added, edited and deleted all the time. The second way is to parse out the indexed items in the record so: t th the s se sec seco secon second w wa etcetera and use a quicksearch like feature - but that is getting dangerously close to a normal search and does not give the user a clue as to which words are indexed. The plus is that it would allow references to partial words - e.g. from "refere" you would score hits like "referential" and "reference". Which is quite nice coming to think of it. 4) the script I set up only allows single words to be indexed, not word combinations such as "word combination". Any ideas? If I try to discuss this with my colleagues they will only look as if I am from another planet so I hope anyone will share their thoughts and ideas - both on the coding level and on the wider issue of cross referencing and indexing. It looks like I am setting up some kind of hybrid and that may be a dead end. i'm attaching a file so you can see what I am working on. kjoe ps re 2): the sample file has 980 words in all records. of these, about a hundred were indexed and about 300 were put in the unwanted list. so crudely extrapolating you could say one in three words will end up there (or any word will occur on average three times in the file). that means I can index a 192.000 word text without trouble. That's like 700 pages of A4 text... does this reasoning hold? Link to comment Share on other sites More sharing options...
Ender Posted May 10, 2005 Share Posted May 10, 2005 Hi kjoe, (1. Omiting the current record from the portal requires you to adjust the key on the parent side so it excludes the current record. This can be done by using a ValueListItems() function to pull the serial numbers from the related records, then exclude the current record via a substitute() function. This unstored calc is then used as the new parent key. You can see the attached example (I have not fixed your references to the old portal.) (2. It looks like you're building unwanted words in a global for all records. I would have thought this unwanted words list would be built separately for each record. Are you assuming that if a word is unwanted for one record, it will be unwanted for all? (3. I got a little lost on your requirements for this one. A self-join on the multi-key word list would get you the list of related records, but for highlighting, you may need to match each word separately. Actually, I don't know how you'd make text 'highlight' in FM5/6 within the text block (if this is what you had in mind.) (4. I don't see a way to do phrases unless you allow users to manually type them. Link to comment Share on other sites More sharing options...
Maarten Witberg Posted May 10, 2005 Author Share Posted May 10, 2005 Hi Ender, 1. Great! many thanks! 2. Interesting point. There is an ignore button in the dialog box which results in skipping the word without putting it in either list, maybe I should make that the default action rather than adding to the unwanted list. I am assuming that unwanted means unwanted for all, because the majority of words will never be used for indexing. The whole database would be on one subject so interesting words are interesting for all records and uninteresting words as well. But in the final app I guess the ignore list and the FullIndex field should be editable easily in some way - or in re-indexing a record it should be possible to ignore the unwanted list. Pity the custom dialog only has three buttons. 3. Highlighting words in the text field was not what I meant. I thought to present the full list of indexed words and make a bullet appear beside them, a ! or something simple to signify they have cross references. But I am starting to like the idea of the mock typeahead that allows hits on partial words. I think for now I'll pursue that. 4. I suspected that and probably it can be done sufficiently user friendly manually using a script and a separate layout. thanks again! kjoe Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.