Skip to content

xguo7/AIT690-Assignment2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 

Repository files navigation

AIT690-Assignment2

This Python program called ngram.py will learn an N-gram language model from an arbitrary number of plain text files. The program can generate a given number of sentences based on that N-gram model.

This program can work for any value of N, and output m sentences as the user requires. Your can run the program as follows:

ngram.py n m input-file/s

n refers to the number of grams and m refers to the number of sentences you want to generate.

for example: ngram.py 3 10 'austen-emma.txt' 'austen-persuasion.txt'

The .txt files used in this project are from http://www.gutenberg.org. Thus, you could chose the files name as follows:

'austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt',
'burgess- busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt',
'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt'

Some of the code for fetching the file and calculating Conditional Frequency Distribution is picked up from NTLK Book. https://www.nltk.org/book/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages