59678

Run Length Encoding in Haskell

Question:

import Data.List data Encoding = Multiple Int Char | Single Char deriving (Eq,Show,Ord)

Encoding of run length

encode :: String -> [Encoding] encode inputString =encoding (group inputString) [] encoding :: [String] -> [Encoding] -> [Encoding] encoding groupString xs= if (length groupString == 0) then xs else case (head groupString) of ([c]) ->encoding (tail groupString) (xs ++ [Single c]) (x) -> encoding (tail groupString) (xs ++ [Multiple (length x) (head x)])

Decoding of run length

decode :: [Encoding] -> String decode listString = decoding listString [] decoding :: [Encoding] -> String -> String decoding inputString xs= if (length inputString == 0) then xs else case (head inputString) of (Single x) ->decoding (tail inputString) (xs++ [x]) (Multiple num x) ->decoding (tail inputString) (xs ++ (replicate num x) )

This is my run length encoding solution, can anyone suggest me a better way to write encoding and decoding functions

Answer1:

A lot of your code is dedicated to updated an accumulator. You add elements to the tail of that accumulator, and this will have a drastic impact on performance.

The reason this is usually not very efficient is because a list in Haskell [a] is implemented - at least conceptually - as a <em>linked list</em>. As a result concatenating two lists l1 and l2 together with l1 ++ l2, will take <em>O(|l1|)</em> time (that is the number of elements in l1). So that means that if the list already contains 100 elements, adding one extra at the end will require a lot of work.

Another anti-pattern is the use of head and tail. Yes these functions can be used, but unfortunately since you do not use pattern matching, it could happen that an empty list is passed, and then head and tail will error.

Another problem here is that you use length on a list. Since sometimes lists in Haskell can have infinite length, the length call will - if we need it - never ends.

In cases where you have to append at the end of the accumulator, usually we can write the recursion in the tail of a list "cons" we are building. So we can rewrite our program from:

encode :: String -> [Encoding] encode [] = [] encode (h:t) = ...

The question is now how we can count these values. We can use <a href="http://hackage.haskell.org/package/base-4.10.1.0/docs/Prelude.html#v:span" rel="nofollow"><strong>span :: (a -> Bool) -> [a] -> ([a],[a])</strong></a>, a function that will - for a given predicate and a list - construct a 2-tuple where the first item contains the prefix of the list where all elements satisfy the predicate, and the second item contains the rest of the list, so we can use this like:

encode :: String -> [Encoding] encode [] = [] encode (h:t) | nh > 1 = Multiple nh h : tl | otherwise = Single h : tl where (s1, s2) = span (h ==) t nh = 1 + length s1 tl = encode s2

For example:

Prelude Data.List> encode "Foobaaar qqquuux" [Single 'F',Multiple 2 'o',Single 'b',Multiple 3 'a',Single 'r',Multiple 3 ' ',Multiple 3 'q',Multiple 3 'u',Single 'x']

For the decoding, we again do not need an accumulator, and can make use of <a href="http://hackage.haskell.org/package/base-4.10.1.0/docs/Prelude.html#v:replicate" rel="nofollow"><strong>replicate :: Int -> a -> [a]</strong></a>:

decode :: [Encoding] -> String decode [] = [] decode (Single h:t) = h : decode t decode (Multiple nh h) = replicate nh h ++ decode t

Or we can use a list monad instead:

decode :: [Encoding] -> String decode = (>>= f) where f (Single h) = [h] f (Multiple nh h) = replicate nh h

For example:

Prelude Data.List> decode [Single 'F',Multiple 2 'o',Single 'b',Multiple 3 'a',Single 'r',Multiple 3 ' ',Multiple 3 'q',Multiple 3 'u',Single 'x'] "Foobaaar qqquuux"

Answer2:

As an extension to <a href="https://stackoverflow.com/users/67579/willem-van-onsem" rel="nofollow">Willem Van Onsem's</a> excellent <a href="https://stackoverflow.com/a/48670039/3058609" rel="nofollow">answer</a>, consider creating the run lengths separately, then combining them together with their letters with zipWith.

Data.List has the function <a href="https://hackage.haskell.org/package/base-4.10.1.0/docs/Data-List.html#v:group" rel="nofollow">group</a> (which itself is a specialization of the generic groupBy; group = groupby (==)) which breaks strings into homogenous substrings. i.e.:

group "aaabbccccd" = ["aaa", "bb", "cccc", "d"]

Calculating the length of each will give you run-length.

N.B. that group is implemented exactly the same way as Willem's span solution.

import Data.List (group) data Encoding = Multiple Int Char | Single Char deriving (Eq, Show, Ord) encode :: String -> [Encoding] encode xs = zipWith op lengths letters where groups = group xs lengths = map length groups letters = map head groups op :: Int -> Char -> Encoding op 1 = Single op n = Multiple n

This could also be done as one very ugly list comprehension.

encode xs = [ let (n, c) = (length g, head g) in if n == 1 then Single c else Multiple n c | g <- group xs ]

Answer3:

<ol><li>

Your encoding function is a functional map. Instead of making your own primitive recursive function just use map.

</li> <li>

Your encoding is reversing the output (xs ++ [Single c] etc) which is both counter intuitive and computationally expensive. Stop it.

</li> <li>

Too many parenthesis, such as on case (..) of, if (...) then and the arms of the case (...) ->. All these are unneeded and clutter the code.

</li> <li>

If you type head chances are you should have pattern matched somewhere instead.

</li> </ol>

Consider:

encoding :: String -> [Encoding] encoding = map enc . group -- Point 1, use map which also takes -- care of point 2 and 3. where enc [x] = Single x enc xs@(x:_) = Multiple (length xs) x -- Point 4, patterns not `head` -- Here consider make pattern matches total either via an error call or Maybe type

Recommend

  • How do you stack two Pandas Dataframe columns on top of each other?
  • Print bullet before each sentence + new line after each sentence SQL
  • linked lists with inherited classes in c++?
  • Is using parallel collections encouraged in Spark
  • Embed the existing code of a method in a try-finally block
  • StringBuilder vs ampersand equals concatentation
  • Pandas filter dataframe rows with a specific year
  • python intersect of dict items
  • Formula in Excel that references another Excel file based on cell reference
  • do i need to rewrite the case statement for every field?
  • How to return JSON from spring RESTful service and access it using RestTemplate class
  • Javascript and VERY LONG string
  • Cross platform UI spacing/padding
  • Retrieving specified columns from a list of csv files to create a data data frame in R
  • How to override value that appears in a dropdown in the rails_admin gem
  • How do I include a SWC in an AS2 Flash project?
  • How to add a focus style to an editable ComboBox in WPF
  • How do I superscript characters in a UIButton?
  • Database structure design with variable amounts of fields
  • Highlight one bar in a series in highcharts?
  • Is there a javascript serializer for JSON.Net?
  • Is my CUDA kernel really runs on device or is being mistekenly executed by host in emulation?
  • How to recover from a Spring Social ExpiredAuthorizationException
  • Does CUDA 5 support STL or THRUST inside the device code?
  • Where to put my custom functions in Wordpress?
  • When should I choose bucket sort over other sorting algorithms?
  • How do you troubleshoot character encoding problems?
  • XCode can't find symbols for a specific iOS library/framework project
  • php design question - will a Helper help here?
  • Comma separated Values
  • Buffer size for converting unsigned long to string
  • SQL merge duplicate rows and join values that are different
  • AngularJs get employee from factory
  • Understanding cpu registers
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • IndexOutOfRangeException on multidimensional array despite using GetLength check
  • Authorize attributes not working in MVC 4
  • How can I remove ASP.NET Designer.cs files?
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Binding checkboxes to object values in AngularJs