We'll start by using the [markovify](https://github.com/jsvine/markovify/) library to make some individual sentences in the style of Jane Austen.  These will be the basis for generating a stream of synthetic documents.

In [1]:
import markovify
import codecs
import random

# Markovify uses a single random generator -- notebooks using it will thus 
# only be reproducible if you set a random seed before each cell using markovify
random.seed(0xbaff1ed)

with codecs.open("data/austen.txt", "r", "cp1252") as f:
    text = f.read()

austen_model = markovify.Text(text, retain_original=False, state_size=3)

for i in range(10):
    print(austen_model.make_short_sentence(200))

Such and such-like were the reasonings of Sir Thomas, and in smaller concerns by her sister.
I began to think my caro sposo would be absolutely jealous.
The introduction, however, was immediately made; and she had been last together; much less could her feelings acquit her of having made Harriet unhappy.
Mr. Bennet and his daughters saw all the impropriety of her father's comfort, perhaps even of his life, and you know young people like to be married to a Miss Hawkins.
She could not have believed them.
She was less handsome than her brother; but she could not be a doubt of your secrecy.
I speak feelingly.
She looked about her with due consideration, and found almost everything in his favour, should think highly of himself.
Well, she went on to say something sensible, but knew not what answer she returned to the Park, and Elinor was not blinded by the beauty, or the shrewd look of the youngest, to her want of sense.
The wish was rather eager than lasting.


Constructing single sentences is interesting, but we'd really rather construct larger documents. Here we'll construct a series of documents that have, on average, five sentences.

In [2]:
from scipy.stats import poisson
import numpy as np

def make_basic_document(sentence_count=5, model=austen_model, seed=None):
    def shortsentence(ct):
        return " ".join([model.make_short_sentence(200) for _ in range(ct + 1)])

    if seed is not None:
        # seed both the Python generator and the NumPy one used by SciPy
        random.seed(seed)
        np.random.seed(seed)
    
    return [shortsentence(ct) for ct in poisson.rvs(sentence_count, size=10)]

for doc in make_basic_document(5, seed=0xdecaf):
    print(doc)
    print("\n###\n")

We cannot have two Agathas, and we must have one Cottager's wife; and I am afraid you will be so well off. JOHNSON TO LADY S. VERNON Edward Street. We were all alive.

###

And we agreed it would be delightful. By one measure I might have written home. I am not sorry for, as I know you have the art of pleasing--the art of pleasing, at least, at Kellynch Hall; and who had opportunities of seeing me. By this time, the subject was equally convinced that it is sometimes carried a little too nice. Jane was not happy. Mr Elliot had made a change indeed! It was only necessary to mention any favourite amusement to engage her to talk.

###

He shook his head. I see it in her eyes, seemed all that he wished. A very proper compliment!--and then follows the application, which I think, my dear Harriet, you cannot find much difficulty in comprehending. Here Fanny, who had hoped to see. The necessity of the measure in a pecuniary light, and the hope of her being well principled and religious.

###

A

We're going to use the Austen model as the main basis for _legitimate messages_ in our sample data set.  For the _spam messages_, we'll train two Markov models on positive and negative product reviews (taken from the [public-domain Amazon fine foods reviews dataset on Kaggle](https://www.kaggle.com/snap/amazon-fine-food-reviews/)).  We'll combine the models from these sources in different proportions so that all words are _possible_ in certain kinds of messages but some words are _more likely_ in legitimate messages or in spam.

In [3]:
import gzip

def train_markov_gz(fn):
    """ trains a Markov model on gzipped text data """
    with gzip.open(fn, "rt", encoding="utf-8") as f:
        text = f.read()
    return markovify.Text(text, retain_original=False, state_size=3)

negative_model = train_markov_gz("data/reviews-1.txt.gz")
positive_model = train_markov_gz("data/reviews-5-100k.txt.gz")

We can combine these models with relative weights, but this yields somewhat unusual results:

In [4]:
legitimate_model = markovify.combine([austen_model, negative_model, positive_model], [196, 2, 2])
spam_model = markovify.combine([austen_model, negative_model, positive_model], [3, 30, 40])

In [5]:
# seed both the Python generator and the NumPy one used by SciPy
random.seed(0xc0ffee)
np.random.seed(0xc0ffee)

for i in range(20):
    for s in make_basic_document(5, legitimate_model):
        print(s)
        print("\n###\n")

Please keep a close eye on them because they are dried, so they can't travel around in search of them. This information made Elizabeth smile, as she thought of poor Miss Bingley. Some members of their society sent away, and the horses were baited, he was off. But every thing was soon in a fair way of soon knowing by heart. So far, the worst coffee I ever tried in my Keurig machine, socially responsible and sensitive way they're sourced and manufactured. That will just do for me, you know, to interfere. Well, I never observed that.

###

She had fallen into good hands earlier. I am so very unwell! I fancy Lord S. is very good-humoured and pleasant in his own apartment, had they sat in one equally lively; and she gave herself up for lost.

###

It was to herself an amusing and a very respectable man, though his name was Lindsay--for particular reasons however I shall conceal it under that of Talbot. What a delightful ball we had last night. But they always do, you know. Elizabeth replied

The best ever. To the theatre he went, and reached it just in time by a side glance to see a slight curtsey from Elizabeth herself. But Edmund goes; true, it is upon Edmund's account. There is something in a chapel and chaplain so much in want of money, and they keep their own coach. Suffice it, that he had _cause_ to sigh. I never saw before, it is quite grievous to see her and receive her approbation, very busy and very happy in observing all that was strongest and best in his cellars. Were she your equal in situation--but, Emma, consider how far this is from being the wife of Sir William.

###

The hardness of the pavement for her feet, made him less willing upon the present occasion; he did it, however. Science Diet does, in a very steady friendship, in spite of her anxious desire to penetrate this mystery, to proceed in it to Edinburgh, where I hoped to find her out. Anne could not but recall the attempt with great satisfaction. For a few moments impossible to Fanny's fears that i

My feelings in every way an object of desire; it was a black morning's work for her. As far the taste -- it has a different taste that seems to taste sweeter, lighter, and fluffier. But there are hopes of her being wanted. You are quite enough alike. I use it for baking and frying. Their visitors, except those from Barton Park, were not many; for, in spite of her. Mr. Harris was punctual in his second visit; but he came back he had the pleasure of walking with her till Tuesday.

###

Just the right amount of lemon. Her happiness in going with Miss Tilney, however, prevented their wishing it otherwise; and, as they always are, speculation was decided on almost as soon as they were summoned to it. The only one among them, whose opposition of feeling could excite any serious anxiety was Lady Russell. And this is the only way that Mr. Woodhouse thought it no hardship for either James or Isabella to resent her resistance any longer. Anne was conscious of not doing everything in his favour, 

The stupidest fellow! What probability is there of so desirable an Event; I have had the Martins in the summer. They did not. It was her manner, however, rather than any other Greek or Turkish coffee. She could not be cured of wishing that he would prefer the church to every other objection, would now be William's destination. Though Julia fancies she prefers tragedy, I would not be in Bond-street till just before he mounts his horse to-morrow. This was the fact.

###

I wish he may go to the East room, and took no solitary walk in the rain had reached Mrs. Elton, and her remonstrances now opened upon Jane. She has made me consider improvements _in_ _hand_ as the greatest of all felicities! I have heard it so lately. I will never do in love matters; and that girl is born a simpleton who has it either by nature or affectation. She then proceeded to relate the particulars of Mr. Jennings's last illness, and what he meant, before many meetings had taken place. With these feelings, she rat

Let her go, then. He absolutely started, and for a while put by, unless some tender sonnet, fraught with the apt analogy of the declining year, with declining happiness, and the images of youth and beauty. And you saw the old housekeeper, I suppose? A great deal too kind to her when we married. Still, however, he was not having enough milk for him and her own were perforce delayed a little longer, they returned to Berkeley Street. Do you know, I never wished him to say what she did not love, and obliging her to administer to the adverse passion of the man of the world had interpreted to my discredit?

###

I hope you will like the chain itself, Fanny. Dear Mrs. Weston! always my kindest friend on every occasion. Mary had neither genius nor taste; and though vanity had given her no hopes of a letter of his, I cannot help liking him. We would have persuaded Eloisa to have taken place, there have been any unpleasant glances? He fancied bathing might be good for everyone in this world of c

Miss Crawford smiled her perfect approbation; and hastened to complete the picture of good, the acquisition of a brother? In spite of this release, Frederica still looks unhappy: still fearful, perhaps, of her mother's tender care. Mrs. Weston laughed, and said he did not reign long in peace, for Henry Tudor E. of Richmond as great a gloom as possible over their dinner and dessert. But Jane and Elizabeth, in addition to all the misery of her feelings, their doubt, confusion, and felicity, was enough. Jane says that Colonel Campbell was the giver, I saw it only as a special treat that allows you to select a semi-sweet white wine.

###

He did not love Miss Crawford; but Miss Crawford gave her the severest pain of all. They are very flavorful and spicy, but not in the country. Adeiu my dear Charlotte; although I have not a doubt of its nature, she was anxious to settle, though somewhat embarrassed in speaking of. The product simply tastes horrible. But when a young man, rich and agreeabl

You would not think it proper to join in cutting off the entail, as soon as possible. The portraits themselves seemed to be looking around with a sort of renewed separation from Mansfield; and she could not for the worse. Having now a good deal in Harley Street, as soon as it hits the spot. i subscribed to this product and expect to order from Amazon. The morrow produced no abatement in these happy symptoms. He told her that he appeared to the youthful fancy of her daughter, to whom they could give little fortune; and his prospects of future wealth were exceedingly fair. We copied it from the very first distinction in the scale of vanity. You are not strong. DE COURCY Upper Seymour Street.

###

Mrs. Bennet could hardly contain herself. Which of the two most beloved of the two. Mine arrived within three days. His sisters were anxious for his sister and herself to get acquainted, and forwarded as much as that and more; and I must say, I think she feels it. It proves him unspoilt by his 

Never were such characters cut by any other sentiment. I lay no embargo on any body's words. Smooth, doesn't taste watered down, and no bitter after taste that some do.As stated above, I found this product in my area. Mr. Knightley, who had left the room, resolving that, whatever might be the reason. I have quite a horror of finery. I am very glad you liked her.

###

Jane's curiosity did not appear to me that the chocolate-covered pretzels were not fresh. Our instrument is a capital one, probably superior to----You shall try it some day. No, indeed, Miss Woodhouse, you will not esteem them unreasonable. Great product our dog is eating something healthy. She felt utterly unworthy of such respect, and knew not what to expect to see the house, and in such a state of mind, sighing over the ruin of beauty and health. My mother always just fed them whatever, the kinds of food you buy in the supermarket - Friskies, Nine Lives, Kit & Kaboodle, stuff like that.

###

We had better leave Upperc

In [6]:
random.seed(0xf00)
np.random.seed(0xf00)

for i in range(20):
    for s in make_basic_document(5, spam_model):
        print(s)
        print("\n###\n")

Give it a try! I just added 2 tbsp to the dish when making it; reminiscent of Kraft Mac and Cheese. Practically inedible. Finally, I put my pet on Iams and the digestive system. There are other sources for this product. Aside from being a very yummy tasting brownie. Spinach and mango? Its a wonder I even can recall what it tasted like.

###

I love them for breakfast. Poison.They are not exaggerated reviews - it is lightly crunchy without destroying my dental work. That oiliness is natural and organic--no fields of wheat were sprayed with perfume and then dipped in tropical syrup. Even as I held the treat in half. It has a wonderfully light coconut, mango flavor. I put this candy out and in no way resembles chicken.

###

As for the looks, I don't mind nearly as much! I'm sure it is the highest temperature you can let it steep very long, which is great as a snack twice a day. Why? I typically enjoy smooth medium roasts and I prefer to sweeten my coffee for nothing. I used part of this 

I am disappointed all over again. It takes on the taste of diet pop. It may be pricier but he eats it almost everyday either as breakfast or a snack.. Yum!

###

I primarily drink cappuccinos and lattes. Don't buy, or prepare to clean the things before putting them in the treat balls. With Subscribe & Save, so it works out nicely. Thanks amazon. I have read an article last year about people in other countries on our travels but had run out. Our dog loved them, but they seem a little tougher to chew than hard granola. This is not like the California yeasts that are kind of expensive, I felt it was right to be informed consumers.

###

This stuff is too sweet to drink down like it was covered by taped on labels....over empty JELLO BOXES! I have just one box at my local high priced natural food store. I've now had this replacement probe for approximately 6 months. He is 6 months old.

###

I would never have tried these cans of food were not careful and i drink this twice a day it's prett

Beware folks, the coffee is certainly worth it.FWIW, I also tried adding dried cranberries and cherries, and flax seed to try it today. Perhaps the pieces are pretty small so it's easy to mix, organic, lightweight and environmentally friendly.

###

Saves a trip to get it. My dog loves the how these treats fared with some of reviewers in stating that so many extremely hot sauces lack. I love Kit Kats, but the two I noticed the mold, and it tasted much better.I mix mine in small batches in Kohler Wisconsin. Yeowww! It only needs to be strong. My kids devour it. But if you want to try it.it was the worst case scenario, my puppy could have died if I didn't like spicy, buffalo sauce and wheat thins. So far I have been eating Pfeiffer Caesar dressing for over a year!

###

My baby loves the Earth's Best infant formula and supplement with other child vitamins.I was trying to find a grass that would grow on me. I ordered this brand ONLY because it stated that it was a pleasure to buy most ite

My cats also love the clear instructions that come with it. I have a box on sell. In fact they will fight over who got the bag of dog food that we possibly can but this is definitely going to switch to Premium Edge. I believe that people need to be prepared: first, you open the can and walk away.This is a very satisfying treat.

###

The first three ingredients are a bit hit with our pit bull chico. I have been drinking coconut water and O.N.E. is my favorite coffee. Some discerning reviewers say it has taken a GoGo Squeez with her lunch every day since I got them from the kids! And this is about the right amount of salt and/or sugar added.

###


###

I can eat them all day long. Every time, 90% of the cans had to be restricted to a fish- and vegetable-based diet. Someone drinking it straight up may notice a difference but did not mention that there are other issues with the SodaStream offering. Burgers Smoke House you really out did yourself on this product. Share it or savor it for 

The ingredients are natural - no by-products are used. I've only had this one several times. Addicted after first taste. No mottling on wings.Both moth catchers I received, are triangular open cardboard tubes with the sticky pads, if it touches a leaf it won't come off. It was not. Came fast! was exactly what I was looking for a different product I'm not sure. It's my favorite by far.

###

Unlike the other varieties, but I bet they're pretty good because it is falsely more nutritious when it's the same thing! Love these lightly sweetened iced teas. Very annoying. Be wise. That didn't work either, because she would refuse to eat it. Kills all bugs outside too! I'm so glad Amazon carries the full line of Petite Cusine. Gravy made with this starter and plain yogurt and it is by making a lot more of the Bits of Honey candy than I'd like, that is overridden by the fact that it is organic as well. IT saddens me that Campbell's soup has a hearty flavor and texture.

###

Sometimes, the leg g

Hopefully I will go back to k-cups anymore. These look delicious and when made with milk and honey ~ very nice! You should definitely try this out. It is enjoyed as a young child.... plasticy/rubbery..... I talked with him about her being always hungry. Only, the enjoyment part did not happen. Within 2-3 weeks she was back to her sweet self, but only for three or four hours.We thought she was eating Purina One, the next Natural Balance. This product is delicious!

###

So glad I took the paddle out and had to give up full-fat peanut butter for THIS recipe, but it's better than any other brand.I got a superb service with Amazon.Very, very satisfied.Thank you. Is it to cover up the fact that it is good? A bit pricey but I would not reach for Vanilla flavored anything...just chocolate. They were very thin and was also gluten free, which is a plus.Pick a jar up if you dump it in there.But if you follow vegetarian or kosher/Halal regime. Our 9-year-old shepherd mix is literally a different 

It's good to have a specialty store that you are really worried about it. Would recommend purchasing. You wont regret it.

###

Compared to other big brand names, I think this vanilla salt water taffy has a great price on amazon. There IS a difference! He enjoyed the coffee. In NYC, one box is $5.

###

This is REALLY good peanut butter. What's not to love about that?! Fast and easy to make. First, I dislike the taste of the soup. Fortunately, Amazon provided me with an easy open would be perfect for summer daydreaming, because there's something to please everyone. It has the boldness of regular Starbucks coffee with a bold taste but not bad. Anyway, not wishing to waste any more time and energy on a half dozen of them. They are cooked to perfection.

###

Great that these are much more widely available and the best I have ever tasted. with out any jitters or nervousness. I live in a small bowl. Can't wait for my next order. When I ordered this in the freezer.

###

Since I like the no

We can then generate some example documents and save them to a file for use in the next notebook.  

In [None]:
import pandas as pd
pd.set_option("io.parquet.engine", "pyarrow")

random.seed(0xda7aba5e)
np.random.seed(0xda7aba5e)

output = []

for (whence, model) in [("legitimate", legitimate_model), ("spam", spam_model)]:
    for f in range(20000):
        output.append({"whence" : whence, "text" : make_basic_document(5, model)})

df = pd.DataFrame(output)
pd.to_parquet("data/training.parquet")

In [None]:
pd.set_option("io.parquet.engine", "pyarrow")

random.seed(0xda7aba5e)
np.random.seed(0xda7aba5e)

output = []

for (whence, model) in [("legitimate", austen_model), ("spam", negative_model), ("spam", positive_model)]:
    for f in range(50):
        output.append({"whence" : whence, "text" : make_basic_document(5, model)})

df = pd.DataFrame(output)
df.to_parquet("data/simple-training.parquet")