WSJ, CTB, SPMRL, and UD are widely used. You can check out XCFG for treebank preprocessing.
Compound Probabilistic Context-Free Grammars for Grammar Induction, paper, code.
Unsupervised Recurrent Neural Network Grammars, paper, code.
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders, paper, code.
Unsupervised Learning of PCFGs with Normalizing Flow. paper. code.
- Visually Grounded Neural Syntax Acquisition, [paper], [code].
- Visually Grounded Compound PCFGs, [paper], [code].
- VLGrammar: Grounded Grammar Induction of Vision and Language, [paper], [code].
- Neural Language Modeling by Jointly Learning Syntax and Lexicon, [paper], [code].
- Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks, [paper], [code].
- Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction, [paper], [code].
- Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT, [paper], [code].
- The Return of Lexical Dependencies: Neural Lexicalized PCFGs, [paper], [code].
- Neural Bi-Lexicalized PCFG Induction, [paper], [code].
Bootstrapping Language Acquisition, paper, code.
Scalable Syntax-Aware Language Models Using Knowledge Distillation. paper. code.
Language Modeling with Shared Grammar. paper. code.
Learning to Compose Task-Specific Tree Structures, paper, code.
Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming, paper, code.