77608

Is it possible to create an algorithm which generates an autogram?

Question:

An <a href="http://en.wikipedia.org/wiki/Autogram" rel="nofollow">autogram</a> is a sentence which describes the characters it contains, usually enumerating each letter of the alphabet, but possibly also the punctuation it contains. Here is the example given in the wiki page.

<blockquote>

This sentence employs two a’s, two c’s, two d’s, twenty-eight e’s, five f’s, three g’s, eight h’s, eleven i’s, three l’s, two m’s, thirteen n’s, nine o’s, two p’s, five r’s, twenty-five s’s, twenty-three t’s, six v’s, ten w’s, two x’s, five y’s, and one z.

</blockquote>

Coming up with one is hard, because you don't know how many letters it contains until you finish the sentence. Which is what prompts me to ask: is it possible to write an algorithm which could create an autogram? For example, a given parameter would be the start of the sentence as an input e.g. "This sentence employs", and assuming that it uses the same format as the above "x a's, ... y z's".

I'm not asking for you to actually write an algorithm, although by all means I'd love to see if you know one to exist or want to try and write one; rather I'm curious as to whether the problem is computable in the first place.

Answer1:

You are asking two different questions.

"is it possible to write an algorithm which could create an autogram?"

There are algorithms to find autograms. As far as I know, they use randomization, which means that such an algorithm might find a solution for a given start text, but if it doesn't find one, then this doesn't mean that there isn't one. This takes us to the second question.

"I'm curious as to whether the problem is computable in the first place."

Computable would mean that there is an algorithm which for a given start text either outputs a solution, or states that there isn't one. The above-mentioned algorithms can't do that, and an exhaustive search is not workable. Therefore I'd say that this problem is not computable. However, this is rather of academic interest. In practice, the randomized algorithms work well enough.

Answer2:

Let's assume for the moment that all counts are less than or equal to some maximum M, with M < 100. As mentioned in the OP's link, this means that we only need to decide counts for the 16 letters that appear in these number words, as counts for the other 10 letters are already determined by the specified prefix text and can't change.

One property that I think is worth exploiting is the fact that, if we take some (possibly incorrect) solution and rearrange the number-words in it, then the total letter counts don't change. IOW, if we ignore the letters spent "naming themselves" (e.g. the c in two c's) then the total letter counts only depend on the <em>multiset</em> of number-words that are actually present in the sentence. What that means is that instead of having to consider all possible ways of assigning one of M number-words to each of the 16 letters, we can enumerate just the (much smaller) set of all multisets of number-words of size 16 or less, having elements taken from the ground set of number-words of size M, and for each multiset, look to see whether we can <em>fit</em> the 16 letters to its elements in a way that uses each multiset element exactly once.

Note that a multiset of numbers can be uniquely represented as a nondecreasing list of numbers, and this makes them easy to enumerate.

What does it mean for a letter to "fit" a multiset? Suppose we have a multiset W of number-words; this determines total letter counts for each of the 16 letters (for each letter, just sum the counts of that letter across all the number-words in W; also add a count of 1 for the letter "S" for each number-word besides "one", to account for the pluralisation). Call these letter counts f["A"] for the frequency of "A", etc. Pretend we have a function etoi() that operates like C's atoi(), but returns the numeric value of a number-word. (This is just conceptual; of course in practice we would always generate the number-word from the integer value (which we would keep around), and never the other way around.) Then a letter x fits a particular number-word w in W if and only if f[x] + 1 = etoi(w), since writing the letter x itself into the sentence will increase its frequency by 1, thereby making the two sides of the equation equal.

This does not yet address the fact that if more than one letter fits a number-word, only one of them can be assigned it. But it turns out that it is easy to determine whether a given multiset W of number-words, represented as a nondecreasing list of integers, simultaneously fits <em>any</em> set of letters:

<ul><li>Calculate the total letter frequencies f[] that W implies.</li> <li>Sort these frequencies.</li> <li>Skip past any zero-frequency letters. Suppose there were k of these.</li> <li>For each remaining letter, check whether its frequency is equal to one less than the numeric value of the number-word in the corresponding position. I.e. check that f[k] + 1 == etoi(W[0]), f[k+1] + 1 == etoi(W[1]), etc.</li> <li>If and only if all these frequencies agree, we have a winner!</li> </ul>

The above approach is naive in that it assumes that we choose words to put in the multiset from a size M ground set. For M > 20 there is a lot of structure in this set that can be exploited, at the cost of slightly complicating the algorithm. In particular, instead of enumerating straight multisets of this ground set of all allowed numbers, it would be much better to enumerate multisets of {"one", "two", ..., "nineteen", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"}, and then allow the "fit detection" step to combine the number-words for multiples of 10 with the single-digit number-words.

Recommend

  • How can I malloc a struct array inside a function? Code works otherwise
  • Collect & Randomize the data in Swift
  • Randomize RGB in HTML5 Canvas with a new value for each fillRect using Javascript
  • Performing PCA on a large dataset
  • Fill a column of a numpy array with another array
  • data.table print error when nrow>100
  • What is the equivalent of Matlab's imadjust in python?
  • Get highest value from a file using mSL and mIRC
  • How to override JAXB @XMLAccessorType(XMLAccessType.FIELD) specified at a Class level with @XMLEleme
  • Facebook Open Graph Story Custom Actions Keep Getting Rejected - Advice Please?
  • finding symmetric difference/unique elements in multiple arrays in javascript
  • Cannot convert a char value to money. The char value has incorrect syntax
  • Getting proper map boundaries
  • Is there a chance to get -splash: work for SWT applications that require -XstartOnFirstThread?
  • reduce/reduce conflicts using ocamlyacc
  • why 0.1+0.2-0.3= 5.5511151231258E-17 in php [duplicate]
  • How to return a number as a binary string with a set number of bits in python
  • How to write string.Contains(someText) in expression Tree
  • How to get the index of element in the List in c#
  • Allocating a 2D contiguous array within a function
  • Show records ordered with maximum price first in PHP & MySQL
  • Rails AREL .where statement
  • Refactoring advice: maps to POJOs
  • include dlls in visual studio c++ 2008
  • cordova is not defined - cordova.js has already been loaded :: Ionic
  • jQuery: add elements until a particular height is reached
  • R Split data.frame using a column that represents and on/off switch
  • Combining two different ActiveRecord collections into one
  • Assign variable to the value in HTML
  • How to use carriage return with multiple line?
  • Window Size for Mac application
  • R: gsub and capture
  • jqPlot EnhancedLegendRenderer plugin does not toggle series for Pie charts
  • Comma separated Values
  • SQL merge duplicate rows and join values that are different
  • costura.fody for a dll that references another dll
  • Observable and ngFor in Angular 2
  • UserPrincipal.Current returns apppool on IIS
  • java string with new operator and a literal
  • How to load view controller without button in storyboard?