# Arranging text: some strategies

By [Allison Parrish](https://www.decontextualize.com/). Part of [Human Scale Natural Language Processing](https://hsnlp.decontextualize.com/).

> \[W\]e can start our work... connecting elements according to our own heart's desires, without consideration of pre-privileged structures or hierarchical paths which have been put in place to guarantee exegetical results in a frame of hegemonic unity and harmony. \[...\] The Queer reader becomes a warrior when she... builds guerrilla paths between scenes and concepts. (Marcella Althaus-Reid, *The Queer God*)

In this notebook, I show some methods for taking a text apart and putting it back together, using computational rules and procedures. We'll need randomness everywhere, so:

In [5]:
import random

## Disarticulating the text

First, let's load in a text. In this notebook, we will be using the collectuve corpora text, but you can use your own plain text file here as well

In [7]:
text = open("collective-corpora.txt", encoding="latin-1").read()

Now `text` is a string containing the entire contents of the file that you loaded. You can make a string into the list of the characters in the string like so:

In [8]:
characters = list(text)

In [9]:
print(random.sample(characters, 12))

['e', 'l', 'h', 'a', 't', 't', 'k', ' ', 'b', 'e', 'm', 'o']


Or you can get a list of words (well, sorta, see below) using the `.split()` method of the string:

In [10]:
words = text.split()

In [11]:
print(random.sample(words, 12))

['through', 'the', 'their', 'celebration', 'the', 'it', 'History', 'uncomfortably', 'reminder', 'Abel', 'shape', 'them']


The `.split()` method takes a parameter, which is a string to use as the *delimiter* between the broken-up units. If you don't supply a value for this parameter, the method splits on *white space* (i.e., any of a list of characters that Python considers to be a delimiter between words, such as spaces, newline characters, tabs, etc.). This is convenient, though you'll notice that splitting this way keeps punctuation (like commas and periods) attached to the words that precede them.

### Lines

The line break is one of the most basic units of textual organization and layout. Consequently, there is a character in Unicode that means "put a line break here"—the newline character. Under the hood, the newline character is just a single character like any other (e.g., 'a', ':', '?', etc.), but in order to represent the character in Python strings, we use what's called an *escape character*—special syntax for writing a character that might otherwise be difficult to type or break the rules of Python syntax. The "escape character" in this case is `\n`. It looks like this:

In [12]:
poem = "whose woods these are, i think i know\nhis house is in the village though"

If you pass this variable to `print()` (or anything that evaluates to a string that has a newline escape character in it), Python will render the newline character as a line break:

In [13]:
print(poem)

whose woods these are, i think i know
his house is in the village though


Note, however, that simply *evaluating* the string in your notebook will leave the escape character intact:

In [14]:
poem

'whose woods these are, i think i know\nhis house is in the village though'

The rule here works like this: *evaluating* a variable (or any expression) as the final statement in a Jupyter Notebook cell outputs a *representation* of the resulting value that you could easily copy and paste elsewhere to recover the original value. Using the `print()` function *renders* the value in a way meant to be human-readable. Sometimes the two of these are identical (in the case of printing and evaluating, say, an integer), but with strings, they are distinct.

Often in plain text, line breaks fall between lines of a poem. For example:

In [10]:
jabberwocky = "'Twas brillig, and the slithy toves\ndid gyre and gimble in the wabe\nall mimsy were the borogoves\netc."
print(jabberwocky)

'Twas brillig, and the slithy toves
did gyre and gimble in the wabe
all mimsy were the borogoves
etc.


In this case, you can use `.split("\n")` to break the text up into a list of its component lines:

In [11]:
lines = jabberwocky.split("\n")
print(random.choice(lines))

did gyre and gimble in the wabe


Many plain text files, such as those in [Project Gutenberg](https://www.gutenberg.org/), use line breaks as a way to format the text for easy viewing in fixed-width fonts on computer screens (a throwback to when fixed-width terminals were the primary means of reading and interacting with text on computers). In this case, breaking up the text by line breaks gives you lines of more or less equal length, where the breaks fall in spots that have no particular semantic or syntactic importance. For example, with our Frankenstein text:

In [12]:
lines = text.split("\n")
print(random.choice(lines))

when I first saw the ocean he was but one day's journey in advance, and


Juxtaposing lines that have been separated in this way can spark potentially interesting turns of phrase, e.g.:

In [13]:
for i in range(3):
    print(random.choice(lines))

of night, for in sleep I saw my friends, my wife, and my beloved
hiding-places.  They ascend into the heavens; they have discovered how



Another common arrangement is for each line of a text file to contain an entire paragraph. Breaking the string using `\n` in this case will give you a list of paragraphs in the text. You can also use this for poetic juxtapositions. This is a poetry of textual organization, rather than the lower-level poetry of more atomic linguistic units that will be the focus of our discussion for the most part.

## Cutting against language

In the examples above, we've focused on units of text that fall out naturally from our pre-existing conceptions of how language works: letters form words which form lines and paragraphs. However, nothing limits us to this as an ideology for cutting up text! We have many other options. The `.split()` method, for example, will do whatever we tell it to do, including splitting with delimiters that have no particular semantic or syntactic justification, such as splitting on the letter `e`:

In [14]:
e_splits = text.split('e')
print(random.sample(e_splits, 3))

['dit', ' a gr', ' mis']


Or we can supply more complete lexical units:

In [15]:
and_splits = text.split(' and ') # note the beginning and trailing spaces, to ensure it's the word 'and'
print(random.sample(and_splits, 3))

['waited anxiously to discover from\nhis discourse the meaning of these unusual appearances.\n\n"\'Do you consider,\' said his companion to him, \'that you will be\nobliged to pay three months\' rent', 'passed many hours upon the\nwater.  Sometimes, with my sails set, I was carried by the wind; and\nsometimes, after rowing into the middle of the lake, I left the boat to\npursue its own course', 'my intention is to hire a ship there, which can easily be\ndone by paying the insurance for the owner,']


A slightly more advanced form of the `.split()` method from the `re` ("[regular expressions](https://github.com/aparrish/rwet/blob/master/regular-expressions-a-gentle-introduction.ipynb)") module allows us to specify a set of strings to break on, instead of a single one. (The alternatives are inside the parentheses of the first argument to `.split()`, following `?:`, separated by `|`. You can change these to whatever you want, and add as many alternatives as you want. Let me know if you want to do something more sophisticated than this and I can help you out!)

In [16]:
import re
pronoun_splits = re.split(r"\b(?:I|he|she|they|it)\b", text)
print(random.sample(pronoun_splits, 3))

[' cried;\n"how ', ' was considered,\nexcept in very rare instances, as a vagabond and a slave, doomed to\nwaste his powers for the profits of the chosen few!  And what was ', ' am guilty?"\n\n']


Or, if your delimiters are single characters, there's a simpler form:

In [17]:
import re
clause_splits = re.split(r"[.,;?!]", text)
print(random.sample(clause_splits, 3))

[' and your present humanity assures me of success\nwith those friends whom I am on the point of meeting', ' sorrows', '  He was descended from a good\nfamily in France']


### Cutting by the numbers

For example, the following expression breaks the text up into units of exactly seven characters a piece:

In [18]:
import itertools
units = [text[i:i+7] for i in range(len(text))]
print(random.sample(units, 3))

['tracts ', 'mated g', 'd I cou']


Feel free to change the number 7 to some other number to get differently-lengthed units. These could be used as a means of composing Eigner-esque short forms, e.g.:

In [19]:
units = [text[i:i+19] for i in range(len(text))]
for i in range(8):
    print(random.choice(units))

her only garb; her 
l, and nature again
 longer control her
 was quickly envelo
ed every vestige of
nd gained the frien
after, however, Fel

various subjects, 


Or we might juxtapose stretches of characters of particular lengths to create a visual composition:

In [20]:
for count in range(20):
    # inside the loop, the variable 'count' evaluates to the number of the current iteration
    units = [text[i:i+count+1] for i in range(len(text))] # quick quiz: why i+count+1?
    print(random.choice(units))

f
 I
kne
dmir
t, yo
 my
co
d terro
ow Corne
sible to 
 in those 
marriage dr
s will pass 
I swear inext
 friends and a
 the town of --
 his former mist
abin and
attended
n pale with study,
The gentle words of
 five and twenty yea


We can use this same technique with our list of words instead. For example, to break the text up into pairs of words:

In [21]:
pairs = [' '.join(words[i:i+2]) for i in range(len(words))] # quick quiz: what's the .join() method doing here?
print(random.sample(pairs, 5))

['every other', 'was about', 'and its', 'that I', 'I had']


Here we could make terrible haiku, just really bad haiku:

In [22]:
fives = [' '.join(words[i:i+5]) for i in range(len(words))]
sevens = [' '.join(words[i:i+7]) for i in range(len(words))]
print(random.choice(fives))
print(random.choice(sevens))
print(random.choice(fives))

the innocent to death and
On the evening previous to her being
whom I most loved die


### Cutting by length, with respect to word boundaries

Python includes a helpful library called `textwrap` which "wraps" a string at word boundaries, given a particular number of characters. Here's an example to make it more clear. We'll start with a single string with no line breaks:

In [23]:
story = "Once upon a time, there was a magic squirrel. The squirrel used its magic to conjure acorns from the sky in order to feed all of their friends. It was good. The end!"

The `textwrap.wrap()` function takes a string as an argument, and returns a list of strings, each of which contains sequential words from the original text; but none of the strings will be longer than the limit specified in the second argument. Like so:

In [24]:
import textwrap
textwrap.wrap(story, 40)

['Once upon a time, there was a magic',
 'squirrel. The squirrel used its magic to',
 'conjure acorns from the sky in order to',
 'feed all of their friends. It was good.',
 'The end!']

The purpose of this method is to wrap fixed-width text in order to make it fit into spaces on (e.g.) terminal emulators. But I've used it in the past as a quick-and-dirty method to introduce dramatic enjambments in prose, rendering it more verse-like:

In [25]:
segments = textwrap.wrap(text, 25)
for i in range(8):
    print(random.choice(segments))

misery were strongly
Felix and endeavoured to
that communion
it myself!  I figure to
will soon explain to what
created should perpetrate
into the recesses of
manner and had created a


## Arranging the parts

We have now disarticulated our text. From a technical standpoint, what we've done is turn our original string (e.g., the original source text) into a *list* of strings. Now it remains to us to put the text back together. Broadly, this process consists of three steps:

1. *filtering* the parts: selecting the parts that are relevant to us, for whatever definition of relevance we might choose;
2. *ordering* the parts: inventing a rule for how the parts will be juxtaposed with one another, and the sequence they will fall into; and
3. *joining* the parts back together, turning the list of strings back into a single string.

We'll discuss each of these in turn. I'm going to use 20-character word wrapped segments from the source text as an example dataset in the examples below—make sure to run this cell before continuing:

In [26]:
segments = textwrap.wrap(text, 20)

(Note that I have no particular reason for choosing this method of disarticulation other than I think it's interesting! Feel free to choose some other method from the code above, or invent a method of your own.)

### Filtering

Formally, filtering the parts of the text consists of coming up with a rule that defines which parts will be included in the final output, and which will not. It may happen that you intend to use *all* of the text's disarticulated parts, in which case your filtering rule is "don't throw anything out"—for this rule, no code is needed, and you can proceed to the next section.

However, often you're working with a lengthy text that you would like to somehow distill, or you want to "preview" your methods with a smaller sample of a text before you commit to transforming the text as a whole. In this case, filter you must! I've already implicitly talked about some of the filtering techniques below, but let's go over them in more detail.

> Note that all of these methods can be *composed*—meaning that you can take the results of one filtering process, and use it as the source in a second filtering process, which can become the source of a third filtering process, and so forth. You can either nest the expressions, or assign the intermediate results to variables. I'll show some examples below.

#### Filtering by index

You can grab a single element from the list using Python's list indexing syntax, like so:

In [27]:
segments[1234] # grab the item at index 1234

'broken until near'

Or use Python's list slicing syntax to grab a subsection of elements based on their numerical position in the list:

In [28]:
segments[100:125] # grab items from index 100 up to index 125, not inclusive

['eccentricities',
 'consistent forever.',
 'I shall satiate my',
 'ardent curiosity',
 'with the sight of a',
 'part of the world',
 'never before',
 'visited, and may',
 'tread a land never',
 'before imprinted by',
 'the foot of man.',
 'These are my',
 'enticements, and',
 'they are sufficient',
 'to conquer all fear',
 'of danger or death',
 'and to induce me to',
 'commence this',
 'laborious voyage',
 'with the joy a child',
 'feels when he',
 'embarks in a little',
 'boat, with his',
 'holiday mates, on an',
 'expedition of']

Often I'll use the list slicing syntax to grab just the first handful of lines, in order to validate that my procedure for splitting the text apart worked as I wanted it to:

In [29]:
segments[:20] # first twenty items from the list

['Frankenstein,  or',
 'the Modern',
 'Prometheus   by',
 'Mary Wollstonecraft',
 '(Godwin) Shelley',
 'Letter 1   St.',
 'Petersburgh, Dec.',
 '11th, 17--  TO Mrs.',
 'Saville, England',
 'You will rejoice to',
 'hear that no',
 'disaster has',
 'accompanied the',
 'commencement of an',
 'enterprise which you',
 'have regarded with',
 'such evil',
 'forebodings.  I',
 'arrived here',
 'yesterday, and my']

Or you can index with a negative number, to get a slice that ends at the end of list:

In [30]:
segments[-12:] # last 12 items from the list. spoiler warning, i guess?

['will not surely',
 'think thus.',
 'Farewell."  He',
 'sprang from the',
 'cabin window as he',
 'said this, upon the',
 'ice raft which lay',
 'close to the vessel.',
 'He was soon borne',
 'away by the waves',
 'and lost in darkness',
 'and distance.']

Filtering by index has the virtue of guaranteeing that you are working with a set of parts that are contiguous in the original source text, which is one way of lending conventional semantic *coherence* to the filtered parts.

#### Filtering at random

Picking parts at random is a useful method for creating potentially powerful juxtapositions, perhaps at the cost of a sense of cohesion, and almost certainly at the cost of conventional syntax (unless you've fashioned your disarticulation rules to produce elements with consistent syntactic structure). Python's `random` library has a number of useful methods for drawing items from a list at random, many of which I used above in the "Disarticulating a text" section. The first one I'll mention is `random.choice()`, which picks one item at random from the list:

In [31]:
random.choice(segments)

'conduct I might'

Pairing this with the `print()` function and a `for` loop is an easy way to make a short composition (short-circuiting the process of joining the filtered list back together into a single string):

In [32]:
for i in range(5):
    print(random.choice(segments))

personally to my own
ship.  I had
have ever followed
father had gradually
the emotions that


The `random.sample()` function "samples" the list that you supply, meaning that it draws a certain number of items from the list at random, without replacement (i.e., it won't draw the same element twice—imagine dealing cards from a poker deck). The function takes two arguments: the first is the list to sample from, and the second is the number of items to sample:

In [33]:
random.sample(segments, 12)

['am I to give an',
 'look of affection',
 'they observed that',
 'endeavoured to',
 'view.  About this',
 'if there was any',
 'length of time, for',
 'Henry might stand',
 'have ever followed',
 'affectation, and',
 'kindly taking my',
 'wonderful, forces']

You can easily retrieve the total number of items in the list using the `len()` function. Using this as the second parameter will return a copy of the list, in random order (like shuffling a deck of cards):

In [34]:
shuffled = random.sample(segments, len(segments))
shuffled[:20] # just show the first twenty items, to save space

['sincerely',
 'approached.  I had',
 'about the world',
 'together.  "The old',
 'agony beneath them.',
 'from all hope.  Yet',
 'and discoveries.',
 'move them on',
 'now going to claim',
 'looks; and Safie',
 'kind!  But I now',
 'wealth and rank, the',
 'hung a form which I',
 'that I had beaten',
 'was surprised, but',
 'leave Clerval in a',
 '"\'Enter,\' said De',
 'employment.  The',
 'around; I hung over',
 'some new object']

> Note: There's also a `random.shuffle()` function, which does essentially the same thing as `random.sample(x, len(x))`, but modifies the list in-place, meaning it overwrites the original list.

Combining these this method with the indexing method above, we might (for example), take the last 20 segments of the text and put them in random order:

In [35]:
last_twenty = segments[-20:]
random.sample(last_twenty, 20) # quick quiz: rewrite these two lines as a single expression

['away by the waves',
 'Farewell."  He',
 'He was soon borne',
 'and distance.',
 'flames. The light of',
 'think thus.',
 'or if it thinks, it',
 'that conflagration',
 'cabin window as he',
 'sprang from the',
 'said this, upon the',
 'into the sea by the',
 'winds.  My spirit',
 'ashes will be swept',
 'will fade away; my',
 'ice raft which lay',
 'will not surely',
 'close to the vessel.',
 'and lost in darkness',
 'will sleep in peace,']

#### Filtering based on properties of the string

It is also possible, as you might expect, to select elements from the list of parts based on properties of the elements themselves. The easiest way to do this is with a [list comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) whose membership expression (after the `if` keyword) is true for the elements you want to include, and false for those you want to exclude. A common way to do this is to use Python's [string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) that return `True` or `False`, or return a value that you can use in an expression of inequality. ([More explanation and example uses of Python string methods here.](https://github.com/aparrish/rwet/blob/master/expressions-and-strings.ipynb)).

For example, we can find only the elements of the list that begin with the word `We` (note that this is case sensitive!):

In [36]:
[item for item in segments if item.startswith("We ")]

['We accordingly lay',
 'We watched the rapid',
 'We ascended into my',
 'We stayed several',
 'We had agreed to',
 'We saw many ruined',
 'We travelled at the',
 'We were not allowed',
 'We were told this']

Or only those elements that are entirely upper-case:

In [37]:
[item for item in segments if item.isupper()]

['WITH YOU ON YOUR',
 'WEDDING-NIGHT."',
 'WITH YOU ON YOUR',
 'ON MY WEDDING-NIGHT,',
 '"I SHALL BE WITH YOU',
 'ON YOUR WEDDING-']

Adding expressions of inequality, we can filter for lines that contain a certain substring a particular number of times, using the `.count()` method. Here are all of the segments that have at least five `a`s:

In [38]:
[item for item in segments if item.count("a") >= 5]

['practical advantage.',
 'may again and again',
 'and having amassed a',
 'avalanche and marked',
 'all appearance dead.',
 'paradisiacal dreams',
 'ocean appeared at a']

Or only the elements whose length is exactly seven characters:

In [39]:
[item for item in segments if len(item) == 7]

['offered',
 'already',
 'for its',
 'father,',
 'this to',
 'which I',
 'utterly',
 'that my',
 'renewed',
 'to this',
 'violent',
 'my dear',
 'natural',
 'mounted',
 'to your',
 'regret.',
 'aright.',
 'But the',
 'my eyes',
 'greater']

The `in` operator, if there's a string to the left and a string on the right, evaluates to `True` if the string on the left is a substring of the string to the right. We can use this to find only elements of the list that contain a particular substring:

In [40]:
[item for item in segments if 'monster' in item]

['monster whom I had',
 'this monster, but I',
 'monster seized me; I',
 'monster on whom I',
 'monster that he said',
 'monster whom I had',
 'to me as monsters',
 '"Abhorred monster!',
 'monster that I am, I',
 'I, then, a monster,',
 'monster so hideous',
 "detestable monster.'",
 "'monster!  Ugly",
 '"\'Hideous monster!',
 'shall be monsters,',
 'the monster depart',
 'the monster would',
 'that the monster',
 'rage."  The monster',
 'monster might',
 'monster already',
 'of the monster, as I',
 'me and the monster',
 'monster executed his',
 'monster had blinded',
 'the monster; he',
 'cause--the monster',
 'the monster, be',
 'monster drink deep',
 'gigantic monster,',
 'monster seen from',
 'monster has, then,',
 'The monster']

Any of these conditions can be combined using the `and` and/or `or` operators. The `and` operator evaluates to `True` if both the expression to the left and the expression to the right also evaluate to `True`; the `or` operator evaluates to `True` if either the left expression *or* the right expression evaluate to `True`. For example, to filter for segments that are exactly twenty characters long and begin with the word `I`:

In [41]:
[item for item in segments if item.startswith("I ") and len(item) == 20]

['I heard of a mariner',
 'I heard of him first',
 'I was thirteen years',
 'I had a contempt for',
 'I should first break',
 'I then paused, and a',
 'I walked up and down',
 'I was forced to lean',
 'I perceived that the',
 'I was, for even when',
 'I felt it as it came',
 'I trembled with rage',
 'I am thy creature; I',
 'I lay by the side of',
 'I perceived that the',
 'I improved, however,',
 'I viewed myself in a',
 'I cleared their path',
 'I should make use of',
 'I feared yet did not',
 'I had been as I then',
 'I often referred the',
 'I saw, with surprise',
 'I asked, it is true,',
 'I and my family have',
 'I did not understand',
 'I entreat you not to',
 'I grasped his throat',
 'I concluded that the',
 'I wept bitterly, and',
 'I trembled violently',
 'I seemed to drink in',
 'I sat one evening in',
 'I might pass my life',
 'I now sank refreshed',
 'I had resolved in my',
 'I awoke I found that',
 'I gasped for breath,',
 'I was, the wind that',
 'I am the assassin of',


Or lines that contain either `monsters` or `creatures`:

In [42]:
[item for item in segments if 'monsters' in item or 'creatures' in item]

['creatures, but half',
 'creatures, and am',
 'creatures as if I',
 'fellow-creatures,',
 'creatures.  She',
 'to me as monsters',
 'human creatures.',
 'creatures, who owe',
 'fellow creatures and',
 'creatures.  One was',
 'creatures were',
 'lovely creatures; I',
 'fellow creatures',
 'creatures was',
 'creatures',
 'amiable creatures;',
 'creatures in the',
 "creatures.'",
 'creatures could',
 'shall be monsters,',
 'creatures demanded',
 'creatures lest when',
 'creatures; nay, a',
 'my fellow creatures.',
 'creatures of an',
 'fellow creatures.',
 'creatures, then']

You can chain expressions with `and` and/or `or` together. For example, find only segments that contain at least five `e`s but no other vowel characters:

In [43]:
[item for item in segments
 if item.count('e') >= 5
     and item.count('a') == 0
     and item.count('i') == 0
     and item.count('o') == 0
     and item.count('u') == 0]

['expressed these',
 'beheld the elements',
 'secret.  Remember, I',
 'I well remembered',
 'degree I never',
 'were deemed',
 'remembered them',
 'she threw her eyes',
 'her defence.  As the',
 'eyes seemed',
 'the elements, here',
 'remembered the',
 'Everywhere I see',
 'degrees, I remember,',
 'gentle breeze',
 'Yet why were these',
 'when they reserved',
 'felt depressed; when',
 'respect, her eyes',
 'the ever-gentle',
 'the wretched sphere',
 '"These were the',
 'cheered even me by',
 'the gentle breezes',
 'where these scenes',
 'every new scene,',
 'new scene; they were',
 'remembered the',
 'then entered the',
 'remembered the',
 'rendered every',
 'enemy.  She left me,',
 'speed.  There were',
 'cemetery where',
 'when they beheld the',
 'I fell, never, never']

### Sorting

At this point, you have a filtered list of segments that will be a part of your composition. If you used `random` to select the items, then the order of those segments will also be random; otherwise, the order of the segments will follow the order that you've been iterating over them—i.e., the order that they occur in the text. It is possible to order these segments otherwise. We'll do this using Python's `sorted` function.

For the following examples, I'm going to use a list of our segments filtered to only the segments that begin with the word `she` and consist of fewer than four words (again for no reason other than it seems like fun). I assign this to the variable `filtered`, which I will use in the expressions below.

In [44]:
# quick quiz: can you think of other ways to determine the number of words in a string?
filtered = [item for item in segments if item.startswith('she ') and item.count(' ') < 4]

In [45]:
filtered

['she plaited straw',
 'she knelt by',
 'she had gone',
 'she had been',
 'she found ample',
 'she heard that the',
 'she looked so frank-',
 'she paid the',
 'she was much altered',
 'she the accused?',
 'she should suffer as',
 'she threw her eyes',
 'she had passed the',
 'she said, "how',
 'she desired',
 'she did not answer.',
 'she said; "that pang',
 'she with difficulty',
 'she would have been',
 'she had nursed from',
 'she returned bearing',
 'she and the youth',
 'she endeavoured to',
 'she had risen, he',
 'she saw him, threw',
 'she came.  Some',
 'she gently deplored',
 'she heard of the',
 'she assuredly act if',
 'she could give me,',
 'she shall atone.',
 'she should find her']

If you invoke the `sorted()` function with a list as its sole argument, it evaluates to a copy of the list in alphabetical order:

In [46]:
sorted(filtered)

['she and the youth',
 'she assuredly act if',
 'she came.  Some',
 'she could give me,',
 'she desired',
 'she did not answer.',
 'she endeavoured to',
 'she found ample',
 'she gently deplored',
 'she had been',
 'she had gone',
 'she had nursed from',
 'she had passed the',
 'she had risen, he',
 'she heard of the',
 'she heard that the',
 'she knelt by',
 'she looked so frank-',
 'she paid the',
 'she plaited straw',
 'she returned bearing',
 'she said, "how',
 'she said; "that pang',
 'she saw him, threw',
 'she shall atone.',
 'she should find her',
 'she should suffer as',
 'she the accused?',
 'she threw her eyes',
 'she was much altered',
 'she with difficulty',
 'she would have been']

This is a crude method of writing something resembling an [abecedarian](https://poets.org/glossary/abecedarian) or a [catalog poem](https://www.poetryfoundation.org/articles/157311/taking-stock-with-the-catalog-poem).

With the addition of `reverse=True` as a parameter, the items are returned in reverse alphabetical order:

In [47]:
sorted(filtered, reverse=True)

['she would have been',
 'she with difficulty',
 'she was much altered',
 'she threw her eyes',
 'she the accused?',
 'she should suffer as',
 'she should find her',
 'she shall atone.',
 'she saw him, threw',
 'she said; "that pang',
 'she said, "how',
 'she returned bearing',
 'she plaited straw',
 'she paid the',
 'she looked so frank-',
 'she knelt by',
 'she heard that the',
 'she heard of the',
 'she had risen, he',
 'she had passed the',
 'she had nursed from',
 'she had gone',
 'she had been',
 'she gently deplored',
 'she found ample',
 'she endeavoured to',
 'she did not answer.',
 'she desired',
 'she could give me,',
 'she came.  Some',
 'she assuredly act if',
 'she and the youth']

The `sorted()` function can sort not just alphabetically, but using any arbitrary property of the string that can be sorted—not just strings but also, say, numbers. Using this feature of the function is a bit tricky, but I'm going to try to break it down as simply as possible. Schematically, it looks like this:

    sorted(your_list, key=lambda item: your_expression)
    
... where `your_list` is the list you want to sort, and `your_expression` is some Python expression that evaluates to the value that you want to use to stand in for each element in the list for the purpose of sorting, with `item` being available in the expression to refer to the value of that element. Essentially, you are writing an expression that Python will use to determine the *sorting value* for the corresponding element in the list. Let's look at a quick example. The following expression sorts the elements of the list by their length:

In [48]:
sorted(filtered, key=lambda item: len(item))

['she desired',
 'she knelt by',
 'she had gone',
 'she had been',
 'she paid the',
 'she said, "how',
 'she found ample',
 'she came.  Some',
 'she the accused?',
 'she heard of the',
 'she shall atone.',
 'she plaited straw',
 'she and the youth',
 'she had risen, he',
 'she heard that the',
 'she threw her eyes',
 'she had passed the',
 'she endeavoured to',
 'she saw him, threw',
 'she could give me,',
 'she did not answer.',
 'she with difficulty',
 'she would have been',
 'she had nursed from',
 'she gently deplored',
 'she should find her',
 'she looked so frank-',
 'she was much altered',
 'she should suffer as',
 'she said; "that pang',
 'she returned bearing',
 'she assuredly act if']

And the following sorts by the number of `e`s in the element:

In [49]:
sorted(filtered, key=lambda item: item.count('e'))

['she said, "how',
 'she said; "that pang',
 'she with difficulty',
 'she plaited straw',
 'she knelt by',
 'she had gone',
 'she found ample',
 'she looked so frank-',
 'she paid the',
 'she should suffer as',
 'she did not answer.',
 'she had nursed from',
 'she and the youth',
 'she saw him, threw',
 'she assuredly act if',
 'she shall atone.',
 'she should find her',
 'she had been',
 'she heard that the',
 'she was much altered',
 'she the accused?',
 'she had passed the',
 'she desired',
 'she had risen, he',
 'she came.  Some',
 'she heard of the',
 'she could give me,',
 'she would have been',
 'she returned bearing',
 'she endeavoured to',
 'she gently deplored',
 'she threw her eyes']

And the following sorts by alphabetically by the last word in the element, rather than the first:

In [50]:
# this one is tricky! hint: what kind of value does .split() evaluate to?
sorted(filtered, key=lambda item: item.split()[-1]) 

['she said, "how',
 'she came.  Some',
 'she the accused?',
 'she was much altered',
 'she found ample',
 'she did not answer.',
 'she should suffer as',
 'she shall atone.',
 'she returned bearing',
 'she had been',
 'she would have been',
 'she knelt by',
 'she gently deplored',
 'she desired',
 'she with difficulty',
 'she threw her eyes',
 'she looked so frank-',
 'she had nursed from',
 'she had gone',
 'she had risen, he',
 'she should find her',
 'she assuredly act if',
 'she could give me,',
 'she said; "that pang',
 'she plaited straw',
 'she heard that the',
 'she paid the',
 'she had passed the',
 'she heard of the',
 'she saw him, threw',
 'she endeavoured to',
 'she and the youth']

### Permutations and combinations

Another method for arranging units of text is *permutation* and its close relative, *combination*. A permutation of a sequence of elements is *every possible arrangement* of those elements. You can also find every unique *combination* of a certain length of elements in a list. In permutation, the order of the elements matter (i.e., 'ABCD' and 'BACD' are different permutations of the characters in the string 'ABCD'), whereas in combinations, the order does not matter (i.e., 'ABC' and 'ACB' are both the same combination, since both share the same elements). I have a separate tutorial with [more information on permutations, combinations and randomness](https://github.com/aparrish/eroft/blob/master/randomness.ipynb) using Tarot cards as an example, if you're interested!

It's easier to understand these concepts with a concrete example. First, we need to import the `itertools` library (short for "iteration tools"), which has handy functions for producing permutations and combinations.

In [51]:
import itertools

Now, I define a list of elements, which we're going to apply permutations and combinations to. (We'll use data from the text file later, but it's easier to begin with a list of strings whose length and contents we can easily see.)

In [52]:
fruits = ['apple', 'banana', 'lemon', 'raspberry']

The expression below calls the `permutations()` function in the itertools library and turns it into a list. This returns a list, where each element contains a different permutation of the four items in our original list:

In [53]:
list(itertools.permutations(fruits))

[('apple', 'banana', 'lemon', 'raspberry'),
 ('apple', 'banana', 'raspberry', 'lemon'),
 ('apple', 'lemon', 'banana', 'raspberry'),
 ('apple', 'lemon', 'raspberry', 'banana'),
 ('apple', 'raspberry', 'banana', 'lemon'),
 ('apple', 'raspberry', 'lemon', 'banana'),
 ('banana', 'apple', 'lemon', 'raspberry'),
 ('banana', 'apple', 'raspberry', 'lemon'),
 ('banana', 'lemon', 'apple', 'raspberry'),
 ('banana', 'lemon', 'raspberry', 'apple'),
 ('banana', 'raspberry', 'apple', 'lemon'),
 ('banana', 'raspberry', 'lemon', 'apple'),
 ('lemon', 'apple', 'banana', 'raspberry'),
 ('lemon', 'apple', 'raspberry', 'banana'),
 ('lemon', 'banana', 'apple', 'raspberry'),
 ('lemon', 'banana', 'raspberry', 'apple'),
 ('lemon', 'raspberry', 'apple', 'banana'),
 ('lemon', 'raspberry', 'banana', 'apple'),
 ('raspberry', 'apple', 'banana', 'lemon'),
 ('raspberry', 'apple', 'lemon', 'banana'),
 ('raspberry', 'banana', 'apple', 'lemon'),
 ('raspberry', 'banana', 'lemon', 'apple'),
 ('raspberry', 'lemon', 'apple',

All possible orderings of the four elements are displayed. (There are $n!$ possible permutations of $n$ elements: in this case $4 \times 3 \times 2 \times 1$.) The following cell represents this a bit more clearly, by combining each permutation into a single string:

In [54]:
[' '.join(item) for item in itertools.permutations(fruits)]

['apple banana lemon raspberry',
 'apple banana raspberry lemon',
 'apple lemon banana raspberry',
 'apple lemon raspberry banana',
 'apple raspberry banana lemon',
 'apple raspberry lemon banana',
 'banana apple lemon raspberry',
 'banana apple raspberry lemon',
 'banana lemon apple raspberry',
 'banana lemon raspberry apple',
 'banana raspberry apple lemon',
 'banana raspberry lemon apple',
 'lemon apple banana raspberry',
 'lemon apple raspberry banana',
 'lemon banana apple raspberry',
 'lemon banana raspberry apple',
 'lemon raspberry apple banana',
 'lemon raspberry banana apple',
 'raspberry apple banana lemon',
 'raspberry apple lemon banana',
 'raspberry banana apple lemon',
 'raspberry banana lemon apple',
 'raspberry lemon apple banana',
 'raspberry lemon banana apple']

In the following cell, we find every possible *combination* of three elements from the list. Each one of these combinations is unique, regardless of the order in which the combinations occur:

In [55]:
list(itertools.combinations(['apple', 'banana', 'lemon', 'raspberry'], 3))

[('apple', 'banana', 'lemon'),
 ('apple', 'banana', 'raspberry'),
 ('apple', 'lemon', 'raspberry'),
 ('banana', 'lemon', 'raspberry')]

Again, here's a cell that makes this a little bit more friendly by combining the combinations into a single string:

In [56]:
[' '.join(item) for item in itertools.combinations(['apple', 'banana', 'lemon', 'raspberry'], 3)]

['apple banana lemon',
 'apple banana raspberry',
 'apple lemon raspberry',
 'banana lemon raspberry']

There are $\frac{n!}{k!(n! - k!)}$ possible combinations of length $k$ in a collection with $n$ elements; if you're curious, the following function calculates this:

In [57]:
import operator as op
from functools import reduce
def ncr(n, r):
    r = min(r, n-r)
    numer = reduce(op.mul, range(n, n-r, -1), 1)
    denom = reduce(op.mul, range(1, r+1), 1)
    return numer // denom

In [58]:
ncr(len(fruits), 3)

4

An example of a well-known poem using permutation is Brion Gysin's [permutation poems](https://nickm.com/memslam/permutation_poems.html), one of which I re-implement below:

In [59]:
[' '.join(item) for item in itertools.permutations("kick that habit man".split())]

['kick that habit man',
 'kick that man habit',
 'kick habit that man',
 'kick habit man that',
 'kick man that habit',
 'kick man habit that',
 'that kick habit man',
 'that kick man habit',
 'that habit kick man',
 'that habit man kick',
 'that man kick habit',
 'that man habit kick',
 'habit kick that man',
 'habit kick man that',
 'habit that kick man',
 'habit that man kick',
 'habit man kick that',
 'habit man that kick',
 'man kick that habit',
 'man kick habit that',
 'man that kick habit',
 'man that habit kick',
 'man habit kick that',
 'man habit that kick']

#### Exponential!

Note that the number of possible combinations and permutations for a list of things increases exponentially for each additional item added to the list. For four items, we have only a handful of permutations:

In [60]:
4 * 3 * 2 * 1

24

But for as few as ten items, we have several *million*:

In [61]:
10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1

3628800

This means that when you're using permutations and combinations, you should always work with a source list of just a few items, maybe seven or eight at most. For example, using the filtered list we made in the previous section:

In [62]:
sampled = random.sample(filtered, 4)

In [63]:
# all combinations of length 2
[' '.join(item) for item in list(itertools.combinations(sampled, 2))]

['she desired she would have been',
 'she desired she shall atone.',
 'she desired she plaited straw',
 'she would have been she shall atone.',
 'she would have been she plaited straw',
 'she shall atone. she plaited straw']

In [64]:
# all permutations
[' '.join(item) for item in list(itertools.permutations(sampled))]

['she desired she would have been she shall atone. she plaited straw',
 'she desired she would have been she plaited straw she shall atone.',
 'she desired she shall atone. she would have been she plaited straw',
 'she desired she shall atone. she plaited straw she would have been',
 'she desired she plaited straw she would have been she shall atone.',
 'she desired she plaited straw she shall atone. she would have been',
 'she would have been she desired she shall atone. she plaited straw',
 'she would have been she desired she plaited straw she shall atone.',
 'she would have been she shall atone. she desired she plaited straw',
 'she would have been she shall atone. she plaited straw she desired',
 'she would have been she plaited straw she desired she shall atone.',
 'she would have been she plaited straw she shall atone. she desired',
 'she shall atone. she desired she would have been she plaited straw',
 'she shall atone. she desired she plaited straw she would have been',
 'she 

#### Products

More info TK.

In [65]:
list(itertools.product(fruits, ['bread', 'cake', 'tart']))

[('apple', 'bread'),
 ('apple', 'cake'),
 ('apple', 'tart'),
 ('banana', 'bread'),
 ('banana', 'cake'),
 ('banana', 'tart'),
 ('lemon', 'bread'),
 ('lemon', 'cake'),
 ('lemon', 'tart'),
 ('raspberry', 'bread'),
 ('raspberry', 'cake'),
 ('raspberry', 'tart')]

In [66]:
[' '.join(item) for item
    in list(itertools.product(['apple', 'banana', 'lemon', 'raspberry'], ['bread', 'cake', 'tart']))]

['apple bread',
 'apple cake',
 'apple tart',
 'banana bread',
 'banana cake',
 'banana tart',
 'lemon bread',
 'lemon cake',
 'lemon tart',
 'raspberry bread',
 'raspberry cake',
 'raspberry tart']

## Joining the parts back together

Nearly all of the procedures above have resulted in a list of strings. We have a few strategies of combining those lists of strings back into a single string that we can cut-and-paste, or save to a file, or whatever else you want to do. The first is to join the strings together, using the `.join()` method. The `.join()` method works like this:

    glue.join(your_list_of_strings)
    
... where *glue* is the string that you want to use as "filler" between each of the elements of *your_list_of_strings*.

Before I demonstrate, I'm going to rebuild my broken up and filtered list of text units below, consisting of:

* segments of text from the source text, word wrapped, with no segment longer than 20 characters;
* filtered to contain only those segments that start with the word 'she' with fewer than four words;
* sorted in order of length

In [67]:
segments = textwrap.wrap(text, 20)
filtered = [item for item in segments if item.startswith('she ') and item.count(' ') < 4]
final = sorted(filtered, key=lambda item: len(item))

One of the most common strategies here is to insert a linebreak between each element, using `\n` as the "glue." For example:

In [68]:
"\n".join(final)

'she desired\nshe knelt by\nshe had gone\nshe had been\nshe paid the\nshe said, "how\nshe found ample\nshe came.  Some\nshe the accused?\nshe heard of the\nshe shall atone.\nshe plaited straw\nshe and the youth\nshe had risen, he\nshe heard that the\nshe threw her eyes\nshe had passed the\nshe endeavoured to\nshe saw him, threw\nshe could give me,\nshe did not answer.\nshe with difficulty\nshe would have been\nshe had nursed from\nshe gently deplored\nshe should find her\nshe looked so frank-\nshe was much altered\nshe should suffer as\nshe said; "that pang\nshe returned bearing\nshe assuredly act if'

This looks like a mess, but again, if we want Python to actually render the line breaks, we need to use `print()`:

In [69]:
print("\n".join(final))

she desired
she knelt by
she had gone
she had been
she paid the
she said, "how
she found ample
she came.  Some
she the accused?
she heard of the
she shall atone.
she plaited straw
she and the youth
she had risen, he
she heard that the
she threw her eyes
she had passed the
she endeavoured to
she saw him, threw
she could give me,
she did not answer.
she with difficulty
she would have been
she had nursed from
she gently deplored
she should find her
she looked so frank-
she was much altered
she should suffer as
she said; "that pang
she returned bearing
she assuredly act if


We could also re-join these elements using whatever string we want, say an emoji:

In [70]:
print(" ☠️ ".join(final))

she desired ☠️ she knelt by ☠️ she had gone ☠️ she had been ☠️ she paid the ☠️ she said, "how ☠️ she found ample ☠️ she came.  Some ☠️ she the accused? ☠️ she heard of the ☠️ she shall atone. ☠️ she plaited straw ☠️ she and the youth ☠️ she had risen, he ☠️ she heard that the ☠️ she threw her eyes ☠️ she had passed the ☠️ she endeavoured to ☠️ she saw him, threw ☠️ she could give me, ☠️ she did not answer. ☠️ she with difficulty ☠️ she would have been ☠️ she had nursed from ☠️ she gently deplored ☠️ she should find her ☠️ she looked so frank- ☠️ she was much altered ☠️ she should suffer as ☠️ she said; "that pang ☠️ she returned bearing ☠️ she assuredly act if


The `.join()` method evaluates to a string, so we can also use that string as input for one of the articulation methods we used before, and start all over again from scratch. For example, let's text wrap the emoji-joined text:

In [71]:
skulls = " ☠️ ".join(final)
skull_lines = textwrap.wrap(skulls, 25)
print("\n".join(skull_lines))

she desired ☠️ she knelt
by ☠️ she had gone ☠️ she
had been ☠️ she paid the
☠️ she said, "how ☠️ she
found ample ☠️ she came.
Some ☠️ she the accused?
☠️ she heard of the ☠️
she shall atone. ☠️ she
plaited straw ☠️ she and
the youth ☠️ she had
risen, he ☠️ she heard
that the ☠️ she threw her
eyes ☠️ she had passed
the ☠️ she endeavoured to
☠️ she saw him, threw ☠️
she could give me, ☠️ she
did not answer. ☠️ she
with difficulty ☠️ she
would have been ☠️ she
had nursed from ☠️ she
gently deplored ☠️ she
should find her ☠️ she
looked so frank- ☠️ she
was much altered ☠️ she
should suffer as ☠️ she
said; "that pang ☠️ she
returned bearing ☠️ she
assuredly act if


You can also use a `for` loop to iterate over the units of text and print them individually, like so:

In [72]:
for item in final:
    print(item)

she desired
she knelt by
she had gone
she had been
she paid the
she said, "how
she found ample
she came.  Some
she the accused?
she heard of the
she shall atone.
she plaited straw
she and the youth
she had risen, he
she heard that the
she threw her eyes
she had passed the
she endeavoured to
she saw him, threw
she could give me,
she did not answer.
she with difficulty
she would have been
she had nursed from
she gently deplored
she should find her
she looked so frank-
she was much altered
she should suffer as
she said; "that pang
she returned bearing
she assuredly act if


The following form of the `for` loop makes a variable called `i` available that corresponds to the number of the current line:

In [73]:
for i, item in enumerate(final):
    print(i, item)

0 she desired
1 she knelt by
2 she had gone
3 she had been
4 she paid the
5 she said, "how
6 she found ample
7 she came.  Some
8 she the accused?
9 she heard of the
10 she shall atone.
11 she plaited straw
12 she and the youth
13 she had risen, he
14 she heard that the
15 she threw her eyes
16 she had passed the
17 she endeavoured to
18 she saw him, threw
19 she could give me,
20 she did not answer.
21 she with difficulty
22 she would have been
23 she had nursed from
24 she gently deplored
25 she should find her
26 she looked so frank-
27 she was much altered
28 she should suffer as
29 she said; "that pang
30 she returned bearing
31 she assuredly act if


You can use this to create interesting effects, e.g.:

In [74]:
import math
for i, item in enumerate(final):
    print(" " * int((math.sin(i*0.5)+1)*10), item)

           she desired
               she knelt by
                   she had gone
                    she had been
                    she paid the
                she said, "how
            she found ample
       she came.  Some
   she the accused?
 she heard of the
 she shall atone.
   she plaited straw
        she and the youth
             she had risen, he
                 she heard that the
                    she threw her eyes
                    she had passed the
                  she endeavoured to
               she saw him, threw
          she could give me,
     she did not answer.
  she with difficulty
 she would have been
  she had nursed from
     she gently deplored
          she should find her
               she looked so frank-
                   she was much altered
                    she should suffer as
                    she said; "that pang
                 she returned bearing
             she assuredly act if


Sorry to drop that one on you, I was having fun. Come see me in office hours if you want more details on how it works :)

## Further resources

* My [tutorial on lists and lines](https://github.com/aparrish/rwet/blob/master/understanding-lists-manipulating-lines.ipynb), which is a different take on some of the same material above, including a more thorough description of list comprehensions