Jump to content
Salesforce and other SMB Solutions are coming soon. ×

Parsing multiple text values into multiple numbers


Ender
 Share

Recommended Posts

I don't know if I'm still groggy from my weekend off or what, but I can't seem to get my head around this.

 

I need to take values from a text field and make a comma separated list of numbers that represent those values. Specifically, I have a Race field that may contain a variety of text, and may contain more than one value, like:

African American

 

Black or African American¶White

 

Hispanic¶White

 

Hawaiian or Other Pacific Islander

and I need to get those translated into numbers, where the numbers correspond to the races:

 

1 - American Indian/Alaskan Native

(A person having origins in any of the original peoples of North and South America, and who maintains a tribal affiliation)

2 - Asian

(A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent)

3 - Black/African American

(A person having origins in any of the Black racial groups of Africa)

4 - Hispanic/Latino

(A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race)

5 - Native Hawaiian or Other Pacific Islander

(A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands)

6 - White

(A person having origins in any of the original peoples of Europe, the Middle East, or North Africa)

7 - Other

(A person reporting an ethnicity/race other than those listed)

 

The result field should be in the format:

 

6

 

or if there's more than one race:

 

4,6

 

Unfortunately, because the format of the source text values is not very consistent, what I think I need to do is check the source text for each Race category and then assemble the result numbers. This is what I have so far for determining whether each Race is represented in the source text:

 

Let(
[
americanIndian = Case(Position(Race;"American Indian";1;1) or Position(Race; "Native American";1;1);1);

asian = Case(Position(Race; "Asian";1;1) or Position(Race; "Chinese";1;1) or 
Position(Race; "Hmong";1;1) or Position(Race; "Vietnamese";1;1)  or 
Position(Race; "Laotian";1;1) or Position(Race; "Japanese";1;1);2);

black =  Case(Position(Race; "Black";1;1) or Position(Race; "African";1;1); 3);

hispanic = Case(Position(Race; "Hispanic";1;1) or Position(Race; "Latin";1;1) or Position(Race; "Mexican";1;1) or Position(Race; "Puerto Rican";1;1); 4);

hawaiian = Case(Position(Race; "Hawaiian";1;1); 5);

white = Case(Position(Race; "White";1;1) or Position(Race; "Caucasian";1;1); 6);

raceCount = (americanIndian > 0) + (asian > 0) + (black > 0) + (hispanic > 0) + (hawaiian > 0) + (white > 0)

];

???

)

What I'm having trouble with is how to assemble the result string of comma-separated numbers in a relatively simple manner. Am I going to need a recursive CF for this?

Link to comment
Share on other sites

I think you can do something like:

 

...

result = americanIndian & " " & asian & " " & black & " " & hispanic & " " & hawaiian & " " & white & " " & other

] ;

Substitute ( TrimAll (

Substitute ( result ; "0" ; "" ) ;

0 ; 0 ) ;

" " ; ", " )

)

Link to comment
Share on other sites

Ouch! You don't really care what order the numbers come in, do you?

 

If not, try this instead:

 

Substitute(Substitute(Substitute(Substitute(Substitute(Substitute(Substitute(Substitute(Substitute("American Indian/Alaskan Native", 1), "Asian", 2), "Black/African American", 3), "Hispanic/Latino", 4), "Native Hawaiian", 5), "Other Pacific Islander", 5), "White", 6), "¶", ",")

 

 

 

In this example, I'm pretending that the values in your text field line up pretty neatly with the values that have the numbers attributed to them, but as you can see with category 5 you can substitute the same number for more than one string.

 

If you don't already have a value list driving your original text values, create one from the values in the text field, so as to obtain a comprehensive list of all the values deployed, so you can be sure to substitute a number for each of them as appropriate. (Especially if "Other" is actually specified rather than entered as "Other").

 

if you really really need the numbers to be in consecutive order, bracket the whole formula (as it exists so far) with consecutive Case statements testing for PatternCount within the outcome of the formula (as it exists so far). Fortunately your numbering scheme doesn't go beyond 9 so you don't have to worry about the string "10" giving you a false positive for 1. (Not that you couldn't control for that if it did)

Link to comment
Share on other sites

Ender,

 

my 2c...

 

1) why a number coding? If you would use it for further numerical analysis, I'd imagine you need something else than comma-separated, and if it's for readers, personally I'd prefer seeing "asian, hispanic" instead of "3, 5" and having to mentally translate back.

 

2) why are you hardcoding this? can you foresee all possible race indications that you're trying to categorise? (and for that matter, all categories?) I'm thinking along the lines of a conditional value list you'd compare the field values against. Not sure how though. Would be interesting because it would give a general method for extracting phrases out of a text and grouping them into categories.

 

kjoe

Link to comment
Share on other sites

Thanks guys. The corresponding codes are for a Federal report (the order of the numbers doesn't matter).

 

While the government now asks for these specific categories, the categories have changed over the years, and we haven't particularly liked the choices they have offered. We find people are less likely to answer the Race/Ethnicity question if it's limited to the government's categorizations. For example, if someone is from Kenya, they probably don't think of themselves as "Black/African American", even though for the purposes of this report, they would be. So for years we had asked the question as a fill-in-the-blank (which didn't produce very consistent results), and now have a more limited set of values. But even if our values were limited to the choices that the govenment likes, the problem of translating them into comma-separated numbers would remain.

 

Anyway, it looks like the TrimAll() solution posted by comment was exactly what I was looking for.

 

Thanks again.

Link to comment
Share on other sites

But even if our values were limited to the choices that the govenment likes

 

There's something very deep about that statement :P

 

Probably moot, and overshooting what you're after, but still. If the government keeps changing categories and the people that fill out your questionnaire keep adding ethnic/racial descriptions, then maybe a table that matches descriptions to categories and codes is called for. And then make a conditional value list using the user selected values. So then your calc would simply be

 

Substitute( ValueListItems( Get(Filename ) ; "SelectedGovCodes" ) ; ¶ ; ", " )

 

Where SelectedGovCodes is built from a relation between the original race field in the questionnaire and the description field in the categories table.

 

The tradeoffs are that it's an unstored calc and you've got an extra table to look after. I'm pretty sure all this is nothing new for you. I hadn't personally thought of using a conditional list backwards though, beforehand.

 

kjoe

Link to comment
Share on other sites

 Share



×
×
  • Create New...

Important Information

Terms of Use