62629

Given two strings, how do I find number of reoccurences of one in another?

For example, `s1='abc'`, `s2='kokoabckokabckoab'`. Output should be `3`. (number of times s1 appears in s2).

Not allowed to use `for` or `strfind`. Can only use `reshape`,`repmat`,`size`.

I thought of reshaping s2, so it would contain all of the possible strings of 3s:

s2 =

kok

oko

koa

oab

.... etc

But I'm having troubles from here..

Assuming you have your matrix reshaped into the format you have in your post, you can replicate `s1` and stack the string such that it has as many rows as there are in the reshaped `s2` matrix, then do an equality operator. Rows that consist of all 1s means that we have found a match and so you would simply search for those rows where the total sum is equal to the total length of `s1`. Referring back to my post on dividing up a string into overlapping substrings, we can decompose your string into what you have posted in your question like so:

```%// Define s1 and s2 here s1 = 'abc'; len = length(s1); s2 = 'kokoabckokabckoab'; %// Hankel starts here c = (1 : len).'; r = (len : length(s2)).'; nr = length(r); nc = length(c); x = [ c; r((2:nr)') ]; %-- build vector of user data cidx = (1:nc)'; ridx = 0:(nr-1); H = cidx(:,ones(nr,1)) + ridx(ones(nc,1),:); % Hankel subscripts ind = x(H); % actual data %// End Hankel script %// Now get our data subseqs = s2(ind.'); %// Case where string length is 1 if len == 1 subseqs = subseqs.'; end ``` <hr>

`subseqs` contains the matrix of overlapping characters that you have alluded to in your post. You've noticed a small bug where if the length of the string is 1, then the algorithm won't work. You need to make sure that the reshaped substring matrix consists of a single <strong>column</strong> vector. If we ran the above code without checking the length of `s1`, we would get a row vector, and so simply transpose the result if this is the case.

Now, simply replicate `s1` for as many times as we have rows in `subseqs` so that all of these strings get stacked into a 2D matrix. After, do an equality operator.

```eqs = subseqs == repmat(s1, size(subseqs,1), 1); ```

Now, find the column-wise sum and see which elements are equal to the length of your string. This will produce a single column vector where `1` indicates that we have found a match, and zero otherwise:

```sum(eqs, 2) == len ans = 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 ```

Finally, to add up <strong>how many times</strong> the substring matched, you just have to add up all elements in this vector:

```out = sum(sum(eqs, 2) == len) out = 2 ``` <hr>

As such, we have <strong>two</strong> instances where `abc` is found in your string.

Here is another one,

```s1='abc'; s2='bkcokbacaabcsoabckokabckoabc'; [a,b] = ismember(s2,s1); b = [0 0 b 0 0]; a1=circshift(b,[0 -1]); a2=circshift(b,[0 -2]); sum((b==1)&(a1==2)&(a2==3)) ```

It gives `3` for your input and `4` for my example, and it seems to work well if `ismember` is okey.

Just for the fun of it: this can be done with `nlfilter` from the Image Processing Toolbox (I just discovered this function today and am eager to apply it!):
```ds1 = double(s1); ds2 = double(s2); result = sum(nlfilter(ds2, [1 numel(ds1)], @(x) all(x==ds1))); ```