Skip to content

Course page for TDT4310 Intelligent Text Analytics and Language Understanding, spring 2024.

Notifications You must be signed in to change notification settings

tollefj/TDT4310

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TDT 4310 - Intelligent Text Analytics and Language Understanding - Spring 2024

Throughout this course, we will explore many aspects of natural language processing, starting with the very latest developments within language models - specifically large language models. From there on, we go back to learn more fundamental topics such as part-of-speech tagging, grammars, dependency parsing and tasks like sentiment analysis and topic modeling.

All labs will be provided as Jupyter Notebooks (.ipynb). The first lab will only consist of questions-answers in markdown-cells, to get familiar with the format. The remaining labs will require you to properly use the environment with a mix of markdown and code cells.

You must pass all labs to be eligible for the exam.

🔧 Lab setup

Each lab will have files starting with the prefix lab{N}, ${N} \in {1, 2, 3, 4, 5}$. Each lab will have at least two files:

  • lab{N}_description.md - a description of the lab
  • lab{N}_exercises.ipynb - the main notebook with the exercises
    • you will submit this file to blackboard

📝 Delivery

By the deadline for each lab, you will submit your lab{N}_exercises_{your-username}.ipynb file to Blackboard. You can submit as many times as you want - only the last submission will be considered.

📆 Schedule

Lab Link Published Deadline Topic  Libraries Chapters
1 Lab1 Jan. 8 Jan. 22 Large language models transformers -
2 Lab2 Jan. 22 Feb. 5 Tokenization, introduction to word vectors and language modeling  NLTK 2, 3
3 Lab3 Feb. 5 Feb. 19 Part-of-speech tagging, stemming/lemmatization, TF-IDF NLTK, spaCy 4, 5, 6
4 Lab4 Feb. 19 Mar. 4 Wordnet and SentiWordNet, dependency parsing, POS chunking  spaCy, Scikit-learn 7, 8
5 Lab5 Mar. 4 Mar. 18 Unsupervised topic modeling and named entities Gensim 9, 10, 11

📚 Curriculum

The course curriculum is mostly based around the 2022 book by Ekaterina Kochmar - Getting Started with Natural Language Processing. It is available on Akademika.

About

Course page for TDT4310 Intelligent Text Analytics and Language Understanding, spring 2024.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published