str_word_count

(PHP 4 >= 4.3.0, PHP 5, PHP 7, PHP 8)

str_word_count — 文字列に使用されている単語についての情報を返す

説明

str_word_count(string $string, int $format = 0, ?string $characters = null): array|int

string に含まれる単語数を数えます。オプションの format が指定されていない場合、見つかった単語の数を整数値で返します。 format が指定されている場合は結果が配列で返され、配列の内容は format に依存します。 format に設定できる値と対応する出力については以下で示します。

この関数を使用するうえで、'単語' は「ロケールに依存したアルファベットからなる文字列で、その先頭以外の部分に "'" および "-" を含めることができる」ものと定義されています。マルチバイト文字列を使うロケールはサポートされていないので、注意が必要です。

パラメータ

string

文字列。

format

この関数の戻り値を設定します。現在サポートされている値は以下のとおりです。

0 - 見つかった単語の数を返します。
1 - string の中に見つかった単語を含む配列を返します。
2 - 連想配列を返します。string の中での単語の開始位置がキー、単語自体を対応する値となります。

characters

'単語' とみなされる文字に追加する文字のリスト。

戻り値

選択した format に応じて、配列あるいは整数を返します。

変更履歴

バージョン	説明
8.0.0	`characters` は、nullable になりました。

例

例1 str_word_count() の例

<?php

$str = "Hello fri3nd, you're
       looking          good today!";

print_r(str_word_count($str, 1));
print_r(str_word_count($str, 2));
print_r(str_word_count($str, 1, 'àáãç3'));

echo str_word_count($str);

?>

上の例の出力は以下となります。

Array
(
    [0] => Hello
    [1] => fri
    [2] => nd
    [3] => you're
    [4] => looking
    [5] => good
    [6] => today
)

Array
(
    [0] => Hello
    [6] => fri
    [10] => nd
    [14] => you're
    [29] => looking
    [46] => good
    [51] => today
)

Array
(
    [0] => Hello
    [1] => fri3nd
    [2] => you're
    [3] => looking
    [4] => good
    [5] => today
)

7

参考

explode() - 文字列を文字列により分割する
preg_split() - 正規表現で文字列を分割する
count_chars() - 文字列で使用されている文字に関する情報を返す
substr_count() - 副文字列の出現回数を数える

Improve This Page

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 32 notes

down

cito at wikatu dot com ¶

12 years ago

<?php

/***
 * This simple utf-8 word count function (it only counts) 
 * is a bit faster then the one with preg_match_all
 * about 10x slower then the built-in str_word_count
 * 
 * If you need the hyphen or other code points as word-characters
 * just put them into the [brackets] like [^\p{L}\p{N}\'\-]
 * If the pattern contains utf-8, utf8_encode() the pattern,
 * as it is expected to be valid utf-8 (using the u modifier).
 **/

// Jonny 5's simple word splitter
function str_word_count_utf8($str) {
  return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
?>

down

splogamurugan at gmail dot com ¶

15 years ago

We can also specify a range of values for charlist.



<?php

$str = "Hello fri3nd, you're

       looking          good today! 

       look1234ing";

print_r(str_word_count($str, 1, '0..3'));

?>



will give the result as 



Array ( [0] => Hello [1] => fri3nd [2] => you're [3] => looking [4] => good [5] => today [6] => look123 [7] => ing )

down

MadCoder ¶

18 years ago

Here's a function that will trim a $string down to a certian number of words, and add a...   on the end of it.

(explansion of muz1's 1st 100 words code)



----------------------------------------------

<?php

function trim_text($text, $count){

$text = str_replace("  ", " ", $text);

$string = explode(" ", $text);

for ( $wordCounter = 0; $wordCounter <= $count;wordCounter++ ){ 

$trimed .= $string[$wordCounter];

if ( $wordCounter < $count ){ $trimed .= " "; }

else { $trimed .= "..."; }

}

$trimed = trim($trimed);

return $trimed;

}

?>



Usage

------------------------------------------------

<?php

$string = "one two three four";

echo trim_text($string, 3);

?>



returns:

one two three...

down

Adeel Khan ¶

16 years ago

<?php

/**
 * Returns the number of words in a string.
 * As far as I have tested, it is very accurate.
 * The string can have HTML in it,
 * but you should do something like this first:
 *
 *    $search = array(
 *      '@<script[^>]*?>.*?</script>@si',
 *      '@<style[^>]*?>.*?</style>@siU',
 *      '@<![\s\S]*?--[ \t\n\r]*>@'
 *    );
 *    $html = preg_replace($search, '', $html);
 *
 */

function word_count($html) {

  # strip all html tags
  $wc = strip_tags($html);

  # remove 'words' that don't consist of alphanumerical characters or punctuation
  $pattern = "#[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]+#";
  $wc = trim(preg_replace($pattern, " ", $wc));

  # remove one-letter 'words' that consist only of punctuation
  $wc = trim(preg_replace("#\s*[(\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]\s*#", " ", $wc));

  # remove superfluous whitespace
  $wc = preg_replace("/\s\s+/", " ", $wc);

  # split string into an array of words
  $wc = explode(" ", $wc);

  # remove empty elements
  $wc = array_filter($wc);

  # return the number of words
  return count($wc);

}

?>

down

brettNOSPAM at olwm dot NO_SPAM dot com ¶

21 years ago

This example may not be pretty, but It proves accurate:



<?php

//count words

$words_to_count = strip_tags($body);

$pattern = "/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-\-|:|\&|@)]+/";

$words_to_count = preg_replace ($pattern, " ", $words_to_count);

$words_to_count = trim($words_to_count);

$total_words = count(explode(" ",$words_to_count));

?>



Hope I didn't miss any punctuation. ;-)

down

-1

charliefrancis at gmail dot com ¶

14 years ago

Hi this is the first time I have posted on the php manual, I hope some of you will like this little function I wrote.

It returns a string with a certain character limit, but still retaining whole words.
It breaks out of the foreach loop once it has found a string short enough to display, and the character list can be edited.

<?php
function word_limiter( $text, $limit = 30, $chars = '0123456789' ) {
    if( strlen( $text ) > $limit ) {
        $words = str_word_count( $text, 2, $chars );
        $words = array_reverse( $words, TRUE );
        foreach( $words as $length => $word ) {
            if( $length + strlen( $word ) >= $limit ) {
                array_shift( $words );
            } else {
                break;
            }
        }
        $words = array_reverse( $words );
        $text = implode( " ", $words ) . '&hellip;';
    }
    return $text;
}

$str = "Hello this is a list of words that is too long";
echo '1: ' . word_limiter( $str );
$str = "Hello this is a list of words";
echo '2: ' . word_limiter( $str );
?>

1: Hello this is a list of words&hellip;
2: Hello this is a list of words

down

-2

uri at speedy dot net ¶

11 years ago

Here is a count words function which supports UTF-8 and Hebrew. I tried other functions but they don't work. Notice that in Hebrew, '"' and '\'' can be used in words, so they are not separators. This function is not perfect, I would prefer a function we are using in JavaScript which considers all characters except [a-zA-Zא-ת0-9_\'\"] as separators, but I don't know how to do it in PHP.

I removed some of the separators which don't work well with Hebrew ("\x20", "\xA0", "\x0A", "\x0D", "\x09", "\x0B", "\x2E"). I also removed the underline.

This is a fix to my previous post on this page - I found out that my function returned an incorrect result for an empty string. I corrected it and I'm also attaching another function - my_strlen.

<?php 

function count_words($string) {
    // Return the number of words in a string.
    $string= str_replace("&#039;", "'", $string);
    $t= array(' ', "\t", '=', '+', '-', '*', '/', '\\', ',', '.', ';', ':', '[', ']', '{', '}', '(', ')', '<', '>', '&', '%', '$', '@', '#', '^', '!', '?', '~'); // separators
    $string= str_replace($t, " ", $string);
    $string= trim(preg_replace("/\s+/", " ", $string));
    $num= 0;
    if (my_strlen($string)>0) {
        $word_array= explode(" ", $string);
        $num= count($word_array);
    }
    return $num;
}

function my_strlen($s) {
    // Return mb_strlen with encoding UTF-8.
    return mb_strlen($s, "UTF-8");
}

?>

down

-1

manrash at gmail dot com ¶

15 years ago

For spanish speakers a valid character map may be:



<?php

$characterMap = 'áéíóúüñ';



$count = str_word_count($text, 0, $characterMap);

?>

down

-2

brettz9 - see yahoo ¶

14 years ago

Words also cannot end in a hyphen unless allowed by the charlist...

down

-2

Anonymous ¶

19 years ago

This function seems to view numbers as whitespace. I.e. a word consisting of numbers only won't be counted.

down

-2

php dot net at salagir dot com ¶

6 years ago

This function doesn't handle  accents, even in a locale with accent.
<?php
echo str_word_count("Is working"); // =2

setlocale(LC_ALL, 'fr_FR.utf8');
echo str_word_count("Not wôrking"); // expects 2, got 3.
?>

Cito solution treats punctuation as words and thus isn't a good workaround.
<?php
function str_word_count_utf8($str) {
      return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
echo str_word_count_utf8("Is wôrking"); //=2
echo str_word_count_utf8("Not wôrking."); //=3
?>

My solution:
<?php
function str_word_count_utf8($str) {
    $a = preg_split('/\W+/u', $str, -1, PREG_SPLIT_NO_EMPTY);
    return count($a);
}
echo str_word_count_utf8("Is wôrking"); // = 2
echo str_word_count_utf8("Is wôrking! :)"); // = 2
?>

down

-2

dmVuY2lAc3RyYWhvdG5pLmNvbQ== (base64) ¶

13 years ago

to count words after converting a msword document to plain text with antiword, you can use this function:

<?php
function count_words($text) {
    $text = str_replace(str_split('|'), '', $text); // remove these chars (you can specify more)
    $text = trim(preg_replace('/\s+/', ' ', $text)); // remove extra spaces
    $text = preg_replace('/-{2,}/', '', $text); // remove 2 or more dashes in a row
    $len = strlen($text);
    
    if (0 === $len) {
        return 0;
    }
    
    $words = 1;
    
    while ($len--) {
        if (' ' === $text[$len]) {
            ++$words;
        }
    }
    
    return $words;
}
?>

it strips the pipe "|" chars, which antiword uses to format tables in its plain text output, removes more than one dashes in a row (also used in tables), then counts the words.

counting words using explode() and then count() is not a good idea for huge texts, because it uses much memory to store the text once more as an array. this is why i'm using while() { .. } to walk the string

down

-1

rcATinterfacesDOTfr ¶

21 years ago

Here is another way to count words :
$word_count = count(preg_split('/\W+/', $text, -1, PREG_SPLIT_NO_EMPTY));

down

-3

jazz090 ¶

15 years ago

Personally, I dont like using this function becuase the characters it omits are sometime nessesery for instance MS Word counts ">" or "<" alone as single word where this function doesnt. I like using this however, it counts EVERYTHING:



<?php

function num_words($string){

    preg_match_all("/\S+/", $string, $matches);

    return count($matches[0]);

}

?>

down

-2

joshua dot blake at gmail dot com ¶

17 years ago

I needed a function which would extract the first hundred words out of a given input while retaining all markup such as line breaks, double spaces and the like. Most of the regexp based functions posted above were accurate in that they counted out a hundred words, but recombined the paragraph by imploding an array down to a string. This did away with any such hopes of line breaks, and thus I devised a crude but very accurate function which does all that I ask it to:



<?php

function Truncate($input, $numWords) 

{

  if(str_word_count($input,0)>$numWords)

  {

    $WordKey = str_word_count($input,1);

    $PosKey = str_word_count($input,2);

    reset($PosKey);

    foreach($WordKey as $key => &$value)

    {

        $value=key($PosKey);

        next($PosKey);

    }

    return substr($input,0,$WordKey[$numWords]);

  }

  else {return $input;}

}

?>



The idea behind it? Go through the keys of the arrays returned by str_word_count and associate the number of each word with its character position in the phrase. Then use substr to return everything up until the nth character. I have tested this function on rather large entries and it seems to be efficient enough that it does not bog down at all.



Cheers!



Josh

down

-5

josh at joshblake.net ¶

17 years ago

I was interested in a function which returned the first few words out of a larger string.



In reality, I wanted a preview of the first hundred words of a blog entry which was well over that.



I found all of the other functions which explode and implode strings to arrays lost key markups such as line breaks etc.



So, this is what I came up with:



<?php

function WordTruncate($input, $numWords) {

if(str_word_count($input,0)>$numWords)

{

    $WordKey = str_word_count($input,1);

    $WordIndex = array_flip(str_word_count($input,2));

    return substr($input,0,$WordIndex[$WordKey[$numWords]]);

}

else {return $input;}

}

?>



While I haven't counted per se, it's accurate enough for my needs. It will also return the entire string if it's less than the specified number of words.



The idea behind it? Use str_word_count to identify the nth word, then use str_word_count to identify the position of that word within the string, then use substr to extract up to that position.



Josh.

down

-2

Samer Ata ¶

12 years ago

This is my own version of to get SEO meta description from wordpress post content. it is also generic usage function to get the first n words from a string.

<?php
function my_meta_description($text,$n=10)
{
$text=strip_tags($text);  // not neccssary for none HTML
// $text=strip_shortcodes($text); // uncomment only inside wordpress system
$text = trim(preg_replace("/\s+/"," ",$text));
$word_array = explode(" ", $text);
if (count($word_array) <= $n)
return implode(" ",$word_array);
else
{
$text='';
foreach ($word_array as $length=>$word)
{
    $text.=$word ;
    if($length==$n) break;
    else $text.=" ";
}
}
return $text;
?>

down

-2

philip at cornado dot com ¶

21 years ago

Some ask not just split on ' ', well, it's because simply exploding on a ' ' isn't fully accurate.  Words can be separated by tabs, newlines, double spaces, etc.  This is why people tend to seperate on all whitespace with regular expressions.

down

-5

aix at lux dot ee ¶

19 years ago

One function.
<?php
if (!function_exists('word_count')) {
function word_count($str,$n = "0"){
    $m=strlen($str)/2;
    $a=1;
    while ($a<$m) {
        $str=str_replace("  "," ",$str);
        $a++;
        }
    $b = explode(" ", $str);
    $i = 0; 
    foreach ($b as $v) {
        $i++;
        }
    if ($n==1) return $b;
    else  return $i;

    }
}
$str="Tere Tartu linn";
$c  = word_count($str,1); // it return an array
$d  = word_count($str); // it return int - how many words was in text
print_r($c);
echo $d;
?>

down

-2

Kirils Solovjovs ¶

20 years ago

Nothing of this worked for me. I think countwords() is very encoding dependent. This is the code for win1257. For other layots you just need to redefine the ranges of letters...

<?php
function countwords($text){
        $ls=0;//was it a whitespace?
        $cc33=0;//counter
        for($i=0;$i<strlen($text);$i++){
                $spstat=false; //is it a number or a letter?
                $ot=ord($text[$i]);
                if( (($ot>=48) && ($ot<=57)) ||  (($ot>=97) && ($ot<=122)) || (($ot>=65) && ($ot<=90)) || ($ot==170) ||
                (($ot>=192) && ($ot<=214)) || (($ot>=216) && ($ot<=246)) || (($ot>=248) && ($ot<=254))  )$spstat=true;
                if(($ls==0)&&($spstat)){
                        $ls=1;
                        $cc33++;
                }
                if(!$spstat)$ls=0;
        }
        return $cc33;
}

?>

down

-5

Anonymous ¶

17 years ago

Here is a php work counting function together with a javascript version which will print the same result.

<?php
      //Php word counting function
      function word_count($theString)
      {
        $char_count = strlen($theString);
        $fullStr = $theString." ";
        $initial_whitespace_rExp = "^[[:alnum:]]$";
        
        $left_trimmedStr = ereg_replace($initial_whitespace_rExp,"",$fullStr);
        $non_alphanumerics_rExp = "^[[:alnum:]]$";
        $cleanedStr = ereg_replace($non_alphanumerics_rExp," ",$left_trimmedStr);
        $splitString = explode(" ",$cleanedStr);
        
        $word_count = count($splitString)-1;
        
        if(strlen($fullStr)<2)
        {
          $word_count=0;
        }      
        return $word_count;
      }
?>

<?php
      //Function to count words in a phrase
      function wordCount(theString)
      {
        var char_count = theString.length;
        var fullStr = theString + " ";
        var initial_whitespace_rExp = /^[^A-Za-z0-9]+/gi;
        var left_trimmedStr = fullStr.replace(initial_whitespace_rExp, "");
        var non_alphanumerics_rExp = rExp = /[^A-Za-z0-9]+/gi;
        var cleanedStr = left_trimmedStr.replace(non_alphanumerics_rExp, " ");
        var splitString = cleanedStr.split(" ");
        
        var word_count = splitString.length -1;
        
        if (fullStr.length <2) 
        {
          word_count = 0;
        }      
        return word_count;
      }
?>

down

-4

Artimis ¶

20 years ago

Never use this function to count/separate alphanumeric words, it will just split them up words to words, numbers to numbers.  You could refer to another function "preg_split" when splitting alphanumeric words.  It works with Chinese characters as well.

down

-3

matthewkastor at live dot com ¶

13 years ago

This needs improvement, but works well as is.

<?php
/**
 * Generates an alphabetical index of unique words, and a count of their occurrences, in a file.
 * 
 * This works on html pages or plain text files.
 * This function uses file_get_contents, so it 
 * is possible to use a url instead of a local filename.
 * 
 * Change the search pattern at 
 * <code> $junk = preg_match('/[^a-zA-Z]/', $word); </code>
 * if you want to keep words with numbers or other characters. The pattern
 * I've set searches for anything that is not an upper or lowercase letter,
 * you may want something else.
 * 
 * The array returned will look something like this:
 * <code>
 * Array
 * (
 *     [0] => Array
 *        (
 *            [word] => a
 *            [count] => 21
 *        )
 * 
 *     [1] => Array
 *        (
 *            [word] => ability
 *            [count] => 1
 *        )
 * )
 * </code>
 * 
 * @param string $file The file ( or url ) you want to create an index from.
 * @return array 
 */
function index_page($file) {
    $index = array();
    $find = array(
        '/\r/',
        '/\n/',
        '/\s\s+/'
    );
    $replace = array(
        ' ',
        ' ',
        ' '
    );
    $work = file_get_contents($file);
    $work = preg_replace('/[>][<]/', '> <', $work);
    $work = strip_tags($work);
    $work = strtolower($work);
    $work = preg_replace($find, $replace, $work);
    $work = trim($work);
    $work = explode(' ', $work);
    natcasesort($work);
    $i = 0;
    foreach($work as $word) {
        $word = trim($word);
        $junk = preg_match('/[^a-zA-Z]/', $word);
        if($junk == 1) {
            $word = '';
        }
        if( (!empty($word)) && ($word != '') ) {
            if(!isset($index[$i]['word'])) { // if not set this is a new index
                $index[$i]['word'] = $word;
                $index[$i]['count'] = 1;
            } elseif( $index[$i]['word'] == $word ) {  // count repeats
                $index[$i]['count'] += 1;
            } else { // else this is a different word, increment $i and create an entry
                $i++;
                $index[$i]['word'] = $word;
                $index[$i]['count'] = 1;
            }
        }
    }
    unset($work);
    return($index);
}
?>

example usage:

<?php
$file = 'http://www.php.net/';
// or use a local file, see file_get_contents() for valid filenames and restrictions.

$index = index_page($file);
echo '<pre>'.print_r($index,true).'</pre>';
?>

down

-5

lwright at psu dot edu ¶

17 years ago

If you are looking to count the frequency of words, try:

<?php

$wordfrequency = array_count_values( str_word_count( $string, 1) );

?>

down

-3

andrea at 3site dot it ¶

20 years ago

if string doesn't contain the space " ", the explode method doesn't do anything, so i've wrote this and it seems works better ... i don't know about time and resource



<?php

function str_incounter($match,$string) {

$count_match = 0;

for($i=0;$i<strlen($string);$i++) {

if(strtolower(substr($string,$i,strlen($match)))==strtolower($match)) {

$count_match++;

}

}

return $count_match;

}

?>



example 



<?php

$string = "something:something!!something";

$count_some = str_incounter("something",$string);

// will return 3

?>

down

-5

broncha at rajesharma dot com ¶

8 years ago

Turns out the charlist is set by default for the web. For example, the string

Copyright &copy; ABC Ltd.

is 3 words in the cli and 4 words if executing in web context.

down

-5

eanimator at yahoo dot com ¶

15 years ago

My quick and rough wordLimiter function.



<?php

function WordLimiter($text,$limit=20){

    $explode = explode(' ',$text);

    $string  = '';

        

    $dots = '...';

    if(count($explode) <= $limit){

        $dots = '';

    }

    for($i=0;$i<$limit;$i++){

        $string .= $explode[$i]." ";

    }

        

    return $string.$dots;

}

?>

down

-8

lballard dot cat at gmail dot com ¶

13 years ago

word limiter:



<?php

$str = "my hella long string" ;

$length = 3;

$shortened = 

implode(' ',array_slice(str_word_count($str,1),0,$length));

?>

down

-3

amosbatto at yahoo dot com ¶

3 years ago

//To get an accurate word count in English, some diacritical marks have 
// to be added for words like née, Chloë, naïve, coöpt, façade, piñata, etc.  
$count = str_word_count($str, 0, 'éëïöçñÉËÏÖÇÑ');

//To get the word count for any European language using a Roman alphabet:
$count = str_word_count($str, 0, 'äëïöüÄËÏÖÜáǽćéíĺńóŕśúźÁǼĆÉÍĹŃÓŔŚÚŹ'.
   'àèìòùÀÈÌÒÙãẽĩõñũÃẼĨÕÑŨâêîôûÂÊÎÔÛăĕğĭŏœ̆ŭĂĔĞĬŎŒ̆Ŭ'.
   'āēīōūĀĒĪŌŪőűŐŰąęįųĄĘĮŲåůÅŮæÆøØýÝÿŸþÞẞßđĐıIœŒ'.
   'čďěľňřšťžČĎĚĽŇŘŠŤŽƒƑðÐłŁçģķļșțÇĢĶĻȘȚħĦċėġżĊĖĠŻʒƷǯǮŋŊŧŦ');

down

-4

dev dot vegera at gmail dot com ¶

3 years ago

preg_match_all based function to mimic str_word_count behavior:

<?php
function mb_str_word_count($str, $format = 2, $charlist = '') {
  if ($format < 0 || $format > 2) {
    throw new InvalidArgumentException('Argument #2 ($format) must be a valid format value');
  }
  $count = preg_match_all('#[\p{L}\p{N}][\p{L}\p{N}\'' . $charlist . ']*#u', $str, $matches, $format === 2 ? PREG_OFFSET_CAPTURE : PREG_PATTERN_ORDER);
  if ($format === 0) {
    return $count;
  }
  $matches = $matches[0] ?? [];
  if ($format === 2) {
    $result = [];
    foreach ($matches as $match) {
      $result[$match[1]] = $match[0];
    }
    return $result;
  }
  return $matches;
}
?>

down

-4

aidan at php dot net ¶

19 years ago

This functionality is now implemented in the PEAR package PHP_Compat.

More information about using this function without upgrading your version of PHP can be found on the below link:

http://pear.php.net/package/PHP_Compat

down

-7

jak74 at interia dot pl ¶

7 years ago

// split the phrase by any number of commas or space characters,
// which include " ", \r, \t, \n and \f

$keywords = preg_split("/[\s,]+/", "hypertext language, programming");
print_r($keywords);

＋add a note