Friday, April 29, 2011

Objective C strip out whitespace from NSString

How do I strip out the white space chars from an NSString ?
Consider the following example:

NSString *tmpString = @"    Hello      this  is a   long       string!   ";

I want to convert that to: "Hello this is a long string!". I have tried a function like this, but it's not working:

-(NSString *)cleanupString:(NSString *)theString
{
  // Remove extra whitespace
  NSString *theStringTrimmed = [theString stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceAndNewlineCharacterSet]];
  // Split it on whitespace
  NSArray *wordsInStringArray = [theStringTrimmed componentsSeparatedByString:@"  "];
  // Now join it together again
  NSString *returnString = [[NSString alloc] init];
  for ( NSString* thisElement in wordsInStringArray ) 
      {
      returnString = [ returnString stringByAppendingString:[NSString stringWithFormat:@"%@_",thisElement] ];
      }
     return returnString;
 }
From stackoverflow
  • C has no good, simple way of dealing with string manipulation like this. Your answer is probably the best/easiest your going to get with it. However, instead of separating your words by some arbitrary-length white space string, use componentsSeparatedByCharactersInSet and use the whitespaceCharacterSet again. That should give you the desired results.

  • Here's a snippet from an NSString extension, where "self" is the NSString instance. It can be used to collapse contiguous whitespace into a single space by passing in [NSCharacterSet whitespaceAndNewlineCharacterSet] and ' ' to the two arguments.

    - (NSString *) stringCollapsingCharacterSet: (NSCharacterSet *) characterSet toCharacter: (unichar) ch {
    int fullLength = [self length];
    int length = 0;
    unichar *newString = malloc(sizeof(unichar) * (fullLength + 1));
    
    BOOL isInCharset = NO;
    for (int i = 0; i < fullLength; i++) {
     unichar thisChar = [self characterAtIndex: i];
    
     if ([characterSet characterIsMember: thisChar]) {
      isInCharset = YES;
     }
     else {
      if (isInCharset) {
       newString[length++] = ch;
      }
    
      newString[length++] = thisChar;
      isInCharset = NO;
     }
    }
    
    newString[length] = '\0';
    
    NSString *result = [NSString stringWithCharacters: newString length: length];
    
    free(newString);
    
    return result;
    }
    
  • This should do it...

    NSString *s = @"this is    a  string    with lots  of     white space";
    NSArray *comps = [s componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
    
    NSMutableArray *words = [NSMutableArray array];
    for(NSString *comp in comps) {
      if([comp length] > )) {
        [words addObject:comp];
      }
    }
    
    NSString *result = [words componentsJoinedByString:@" "];
    
  • Alternative solution: get yourself a copy of OgreKit (the Cocoa regular expressions library).

    • OgreKit (Japanese webpage -- code is in English)
    • OgreKit (Google autotranslation):

    The whole function is then:

    NSString *theStringTrimmed =
       [theString stringByTrimmingCharactersInSet:
            [NSCharacterSet whitespaceAndNewlineCharacterSet]];
    OGRegularExpression  *regex =
        [OGRegularExpression regularExpressionWithString:@"\s+"];
    return [regex replaceAllMatchesInString:theStringTrimmed withString:@" "]);
    

    Short and sweet.

    If you're after the fastest solution, a carefully constructed series of instructions using NSScanner would probably work best but that'd only be necessary if you plan to process huge (many megabytes) blocks of text.

    Kendall Helmstetter Gelner : Is there a reason to use OgreKit instead of RegExKitLite? http://regexkit.sourceforge.net It has a very similar replaceOccurrencesOfRegex call, and works on top of the existing RegEX libraries (not sure if Ogre is a whole RegEX engine or what)
    Matt Gallagher : I'm sure both will work. I haven't used regexkit but its a good suggestion to make. People should choose based on the underlying libraries: the PERL-compatible pcre (RegExKitLite) and the Ruby-compatible Oniguruma (OgreKit).
  • Another option for regex is RegexKitLite, which is very easy to embed in an iPhone project:

    [theString stringByReplacingOccurencesOfRegex:@" +" withString:@" "];
    
    norskben : cheers, just caught my eye.
    norskben : It works a tiny bit better without the typo. try [theString stringByReplacingOccurrencesOfRegex:@" +" withString:@" "];
  • Actually there's a very simple solution to that:

    NSString *string = @" spaces in front and at the end ";
    NSString *trimmedString = [string stringByTrimmingCharactersInSet:
                                      [NSCharacterSet whitespaceAndNewlineCharacterSet]];
    NSLog(trimmedString)
    

    (Source)

    Brian Postow : I think that this will eliminate only leading and trailing spaces, and eliminate all of them. it won't deal with "hello foo"
    Brian Postow : d*mn line endings and auto-format... it doesn't deal with "hello______foo" (assume _ -> " " because formatting comments is hard)
  • NSString *theString = @"    Hello      this  is a   long       string!   ";
    
    NSCharacterSet *whitespaces = [NSCharacterSet whitespaceCharacterSet];
    NSPredicate *noEmptyStrings = [NSPredicate predicateWithFormat:@"SELF != ''"];
    
    NSArray *parts = [theString componentsSeparatedByCharactersInSet:whitespaces];
    NSArray *filteredArray = [parts filteredArrayUsingPredicate:noEmptyStrings];
    theString = [filteredArray componentsJoinedByString:@" "];
    

    Don't use complex solutions if there's an easy one.

  • A one line solution:

    NSString *whitespaceString = @" String with whitespaces ";

    NSString *trimmedString = [whitespaceString stringByReplacingOccurrencesOfString:@" " withString:@""];

0 comments:

Post a Comment