Output should be
3. (number of times s1 appears in s2).
Not allowed to use
strfind. Can only use
I thought of reshaping s2, so it would contain all of the possible strings of 3s:
But I'm having troubles from here..
Assuming you have your matrix reshaped into the format you have in your post, you can replicate
s1 and stack the string such that it has as many rows as there are in the reshaped
s2 matrix, then do an equality operator. Rows that consist of all 1s means that we have found a match and so you would simply search for those rows where the total sum is equal to the total length of
s1. Referring back to my post on dividing up a string into overlapping substrings, we can decompose your string into what you have posted in your question like so:
%// Define s1 and s2 here s1 = 'abc'; len = length(s1); s2 = 'kokoabckokabckoab'; %// Hankel starts here c = (1 : len).'; r = (len : length(s2)).'; nr = length(r); nc = length(c); x = [ c; r((2:nr)') ]; %-- build vector of user data cidx = (1:nc)'; ridx = 0:(nr-1); H = cidx(:,ones(nr,1)) + ridx(ones(nc,1),:); % Hankel subscripts ind = x(H); % actual data %// End Hankel script %// Now get our data subseqs = s2(ind.'); %// Case where string length is 1 if len == 1 subseqs = subseqs.'; end<hr>
subseqs contains the matrix of overlapping characters that you have alluded to in your post. You've noticed a small bug where if the length of the string is 1, then the algorithm won't work. You need to make sure that the reshaped substring matrix consists of a single <strong>column</strong> vector. If we ran the above code without checking the length of
s1, we would get a row vector, and so simply transpose the result if this is the case.
Now, simply replicate
s1 for as many times as we have rows in
subseqs so that all of these strings get stacked into a 2D matrix. After, do an equality operator.
eqs = subseqs == repmat(s1, size(subseqs,1), 1);
Now, find the column-wise sum and see which elements are equal to the length of your string. This will produce a single column vector where
1 indicates that we have found a match, and zero otherwise:
sum(eqs, 2) == len ans = 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
Finally, to add up <strong>how many times</strong> the substring matched, you just have to add up all elements in this vector:
out = sum(sum(eqs, 2) == len) out = 2<hr>
As such, we have <strong>two</strong> instances where
abc is found in your string.
Here is another one,
s1='abc'; s2='bkcokbacaabcsoabckokabckoabc'; [a,b] = ismember(s2,s1); b = [0 0 b 0 0]; a1=circshift(b,[0 -1]); a2=circshift(b,[0 -2]); sum((b==1)&(a1==2)&(a2==3))
3 for your input and
4 for my example, and it seems to work well if
ismember is okey.
Just for the fun of it: this can be done with
nlfilter from the Image Processing Toolbox (I just discovered this function today and am eager to apply it!):
ds1 = double(s1); ds2 = double(s2); result = sum(nlfilter(ds2, [1 numel(ds1)], @(x) all(x==ds1)));