Skip to content

Commit

Permalink
Merge pull request ioccc-src#2554 from xexyl/gson92-bugs
Browse files Browse the repository at this point in the history
Change bug status in 1992/gson
  • Loading branch information
lcn2 committed Jul 9, 2024
2 parents dd2c952 + 03b9d7e commit f2d6303
Show file tree
Hide file tree
Showing 7 changed files with 218 additions and 208 deletions.
95 changes: 46 additions & 49 deletions 1992/gson/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The current status of this entry is:

```
STATUS: uses gets() - change to fgets() if possible
STATUS: INABIAF - please DO NOT fix
```

For more detailed information see [1992/gson in bugs.html](../../bugs.html#1992_gson).
Expand All @@ -22,7 +23,7 @@ For more detailed information see [1992/gson in bugs.html](../../bugs.html#1992_
## To use:

``` <!---sh-->
./ag word word2 word3 < /path/to/dictionary
./ag word word2 word3 < dictionary
```


Expand All @@ -36,30 +37,25 @@ This script will determine (or try and determine) where your system dictionary
is located and assuming that it can find one it'll use that. It checks the
following locations though the last one is more ironic:

```
/usr/share/dict/words
/usr/share/lib/spell/words
/usr/ucblib/dict/words
/dev/null # <-- for machines with nothing to say
```

Then using the proper dictionary file it does:
* `/usr/share/dict/words`
* `/usr/share/lib/spell/words`
* `/usr/ucblib/dict/words`
* `/dev/null` # <-- for machines with nothing to say

``` <!---sh-->
./ag free software foundation < /usr/share/dict/words
./ag obfuscated c contest < /usr/share/dict/words
./ag unix international < /usr/share/dict/words
./ag george bush < /usr/share/dict/words
./ag bill clinton < /usr/share/dict/words
./ag ross perot < /usr/share/dict/words
./ag paul e tsongas < /usr/share/dict/words
```
Then it runs the program with the following strings, using the proper dictionary
file:

where `/usr/share/dict/words` is the dictionary file.
* `free software foundation`
* `obfuscated c contest`
* `unix international`
* `george bush`
* `bill clinton`
* `ross perot`
* `paul e tsongas`

Then it uses the [mkdict.sh](%%REPO_URL%%/1992/gson/mkdict.sh) script to create a dictionary file out
of the files [index.html](index.html), [try.sh](%%REPO_URL%%/1992/gson/try.sh) (itself) and
[Makefile](%%REPO_URL%%/1992/gson/Makefile) and it repeats the same commands as above. In the case no
of the files [README.md](README.md), [try.sh](%%REPO_URL%%/1992/gson/try.sh) (itself) and
[Makefile](%%REPO_URL%%/1992/gson/Makefile) and it repeats the same process as above. In the case no
dictionary file can be found in the first step it only runs the commands once
with the created dictionary file.

Expand All @@ -84,20 +80,20 @@ Then try using the program as shown above with the file `words`.

The name of the game:

AG is short for either Anagram Generator or simply AnaGram. It might also be
construed to mean Alphabet Game, and by pure coincidence it happens to be the
`AG` is short for either `Anagram Generator` or simply `AnaGram`. It might also be
construed to mean `Alphabet Game`, and by pure coincidence it happens to be the
author's initials.


### What it does

AG takes one or more words as arguments, and tries to find anagrams of those
`AG` takes one or more words as arguments, and tries to find anagrams of those
words, i.e. words or sentences containing exactly the same letters.


### How to use it

To run AG, you need a dictionary file consisting of distinct words in the
To run `AG`, you need a dictionary file consisting of distinct words in the
natural language of your choice, one word on each line. If your machine doesn't
have one already, you can make your own dictionary by concatenating a few
hundred of your favourite Usenet articles and piping them through the following
Expand All @@ -108,10 +104,10 @@ obfuscated shell script:
z=a-z];tr [A-Z\] \[$z|sed s/[\^$z[\^$z*/_/g|tr _ \\012|grep ..|sort -u
```

Using articles from alt.folklore.computers is likely to make
a more professional-looking dictionary than rec.arts.erotica.
Using articles from `alt.folklore.computers` is likely to make
a more professional-looking dictionary than `rec.arts.erotica`.

AG must be run with the dictionary file as standard input.
`AG` must be run with the dictionary file as standard input.

Because anagrams consisting of just a few words are generally more
meaningful than those consisting of dozens of very short words, the
Expand All @@ -122,46 +118,47 @@ limit can be changed using a numeric command line option, as in
### Bugs and limitations

- There is no error checking.
- Standard input must be seekable, so you can't pipe the dictionary into AG.
- Standard input must be seekable, so you can't pipe the dictionary into `AG`.
- The input sentence and each line in the dictionary may contain at most 32
distinct letters, and each letter may occur at most 15 times.
- Words in the dictionary may be at most 255 bytes long.
- AG cannot handle characters that sign-extend to negative values.
- Although AG works on both 16-bit and 32-bit machines, the size of the problems
- `AG` cannot handle characters that sign-extend to negative values.
- Although `AG` works on both 16-bit and 32-bit machines, the size of the problems
it can solve is severely limited on machines that limit the stack size to 64k or
less.


### NOTICE to those who wish for a greater challenge:

**If you want a greater challenge, don't read any further**:
just try to understand the program via the source.

If you get stuck, come back and read below for additional hints and information.


### Obfuscatory notes

As you can see, AG takes advantage of the new '92 whitespace rules' to
As you can see, `AG` takes advantage of the new '92 whitespace rules' to
achieve a clear, readable, self-documenting layout. The identifiers
have been chosen in a way appropriate for an alphabet game, and common
sources of bugs such as goto statements and malloc/free have been
eliminated. As AG also refrains from abusing the preprocessor, it
eliminated. As `AG` also refrains from abusing the preprocessor, it
doesn't really have much to offer in terms of "surface obfuscation".
Instead, it tries to achieve both its speed and its obscurity through a
careful choice of algorithms. Some of the finer points of those
algorithms are outlined the section below.


### NOTICE to those who wish for a greater challenge

**If you want a greater challenge, don't read any further**:
just try to understand the program via the source.

If you get stuck, come back and read below for additional hints and information.


### How this entry works:

Here follows a description of some of the data structures and
algorithms used by AG. It is by no means complete, but it may help
algorithms used by `AG`. It is by no means complete, but it may help
you get an idea about the general principles.

<hr style="width:10%;text-align:left;margin-left:0">

Internally, AG represents words and sentences as arrays of 32
Internally, `AG` represents words and sentences as arrays of 32
4-bit integer elements. Each element represents the number of
times a letter occurs in the word/sentence. There are 32 elements
because 32 is a convenient power of two larger than the number of
Expand All @@ -186,23 +183,23 @@ iterations of a loop containing some 32-bit bitwise logical
operations, but no arithmetic operations other than those implied
by array indexing.

Subtraction works similarly, and in fact AG only implements
Subtraction works similarly, and in fact `AG` only implements
subtraction directly, handling addition by means of the identity
`a+b = a-(0-b)`.

In addition to this `32*4`-bit representation, AG also forms a so-called
In addition to this `32*4`-bit representation, `AG` also forms a so-called
"signature" that is the bitwise OR of the four `long`s, which is
equivalent to saying that the signature of a word contains a logical 1
in the bit positions corresponding to letters occurring at least once
in that word.

The first thing AG does is to construct a lookup table of 256
The first thing `AG` does is to construct a lookup table of 256
`long`s, one for each 8-bit character value. The entry for a
character will be zero if that character doesn't appear in the
sentence given on the command line, or it will have a single bit
set if the character does appear in the sentence. By adding
together the bit masks for all the letters in the input sentence
using the transpose addition method described above, AG forms the
using the transpose addition method described above, `AG` forms the
`32*4` bit array representation of the input sentence.

The next action performed is reading the dictionary. Those words that
Expand Down Expand Up @@ -243,12 +240,12 @@ maximum number of words in the anagram, as specified by the user.

When the deepest recursion level has been reached, an optimization can
be applied: because no further recursion will be done, there is no
need to look for partial anagrams, and therefore AG only needs to
need to look for partial anagrams, and therefore `AG` only needs to
check for words that contain exactly the same letters as the current
sentence. Those words can be found simply by indexing the hash table
with the signature of the current sentence.

Even when not on the deepest recursion level, AG generally avoids
Even when not on the deepest recursion level, `AG` generally avoids
examining all the entries of the hash table. The idea is that we are
not interested in hash buckets whose words contain any letters not
in the current sentence; these buckets are exactly those whose index
Expand All @@ -270,7 +267,7 @@ even bit positions:
main(){int i=0,s=0xAAAA;do{printf("%04x\t",i);}while(i=((i|~s)+1)&s);}
```

AG uses a similar method but works in the opposite direction, finding
`AG` uses a similar method but works in the opposite direction, finding
the next lower value with zeroes in given bit positions by propagating
borrows across those bits. Some additional adjustments are made
to the hash table index when initiating a recursive search, using
Expand Down
10 changes: 3 additions & 7 deletions 1992/gson/gson.c
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
#include <stdio.h>
#include <unistd.h>
#include <limits.h>
#ifndef ARG_MAX
#define ARG_MAX _POSIX_ARG_MAX
#endif

long a
[4],b[
4],c[4]
,d[0400],e=1;
typedef struct f{long g
,h,i[4] ,j;struct f*k;}f;f g,*
l[4096 ]; char h[ARG_MAX+1],*m,k=3;
l[4096 ]; char h[256],*m,k=3;
long n (o, p,q)long*o,*p,*q;{
long r =4,s,i=0;for(;r--;s=i^
*o^*p, i=i&*p|(i|*p)&~*o++,*q
Expand All @@ -37,7 +33,7 @@ s,i=o->h;q.k=o;r>i?j=l[r=i]:r<i&&
j;char *z,*p;
for(;m ? j.j=
ftell( stdin)
,7,(m= gets(m ))||w(
,7,(m= gets(m ))||w(
&g,315 *13,l[ 4095]
,k,64* 64)&0: 0;n(g
.i,j.i, b)||(u (&j),j.
Expand Down
Loading

0 comments on commit f2d6303

Please sign in to comment.