UNMHG

EACL 2024 (findings) paper: "Towards Unified Uni- and Multi-modal News Headline Generation"

Code

We include (some) of the code used in our experiments. It is structured as follows:

├── model
│   ├── modeling_umms_blip2.py - BLIP-2 extended to handle articles with multiple images and text-only articles
│   └── model_BLIP2_umms.py - Trainer class for the T5CLIP models
│   └── model_umms.py - Trainer class for the modified BLIP-2 model
|── preprocess
|   ├── filter_and_split_PENS.py - pre-processing of the PENS dataset

Data

The text-only PENS dataset can be obtained here, the video-based MLASK here and the images-based M3LS here.

In the paper, in Appendix A.3, we describe the procedure of collecting the image targets for M3LS articles. The textual data (articles/titles/abstracts) and source images should be collected from the original repository. In data/M3LS we are sharing a M3LS_ref_images.tsv file with two columns: HASH (a hashed ID that can be used to match with the corresponding article in M3LS) and REF_IMAGE_URL (points to the image target, i.e., the pictorial summary). Once you collect the images (using e.g., wget), you should process them with data/M3LS/crop_reference_images.py. This script removes a horizontal strip from the bottom of the image. The reason is a watermark (open one of the images, and see for yourself). We remove it, as it would give the model an easy clue for distuingishing the target images.

License

Our code is released under Apache License 2.0, unless stated otherwise.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data/M3LS		data/M3LS
resources		resources
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/M3LS

data/M3LS

resources

resources

src

src

LICENSE

LICENSE

README.md

README.md

Repository files navigation

UNMHG

Code

Data

License

About

Releases

Packages

Languages

License

ufal/UNMHG

Folders and files

Latest commit

History

Repository files navigation

UNMHG

Code

Data

License

About

Resources

License

Stars

Watchers

Forks

Languages