Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems about the head2span #3

Closed
yhcc opened this issue Nov 13, 2021 · 3 comments
Closed

Problems about the head2span #3

yhcc opened this issue Nov 13, 2021 · 3 comments

Comments

@yhcc
Copy link

yhcc commented Nov 13, 2021

Sorry to interrupt you again. I find some span presented in the span_clusters not presented in head2span,
such as a processed sample below

{'document_id': 'bc/cctv/00/cctv_0001', 'cased_words': ['By', '1940', ',', 'China', "'s", 'War', 'of', 'Resistance', 'against', 'Japan', 'had', 'entered', 'a', 'stalemate', '.', 'The', 'situation', 'on', 'our', 'side', 'and', 'the', 'enemy', "'s", 'side', 'was', 'intertwined', '.', 'The', 'Eighth', 'Route', 'Army', 'guerrillas', 'were', 'extraordinarily', 'active', ',', 'creating', 'more', 'and', 'more', 'trouble', 'for', 'the', 'Japanese', 'army', 'in', 'North', 'China', '.', 'Hayao', 'Tada', ',', 'commander', 'of', 'the', 'Japanese', 'North', 'China', 'Area', 'Army', ',', 'adopted', 'a', 'strategy', 'of', 'siege', 'warfare', 'to', 'deal', 'with', 'the', 'Eighth', 'Route', 'Army', '.', 'The', 'specific', 'method', 'was', 'building', 'a', 'closely', 'connected', 'transport', 'network', ',', 'with', 'a', 'road', 'for', 'every', 'village', 'and', 'defensive', 'towers', 'on', 'every', 'road', '.', 'Roads', 'and', 'railways', 'were', 'used', 'as', 'links', 'to', 'connect', 'all', 'of', 'North', 'China', 'into', 'a', 'solid', ',', 'widespread', 'siege', ',', 'in', 'order', 'to', 'strangle', 'the', 'Eighth', 'Route', 'Army', 'and', 'its', 'base', 'areas', 'in', 'this', 'net', '.', 'As', 'part', 'of', 'the', 'Japanese', 'army', "'s", 'strategy', 'of', 'siege', 'warfare', ',', 'railways', 'and', 'roads', 'had', 'actually', 'become', 'the', 'Japanese', 'army', "'s", 'weapons', 'of', 'war', ',', 'becoming', 'a', 'great', 'threat', 'to', 'the', 'base', 'areas', '.', 'In', 'December', '1939', ',', 'Commander', '-', 'in', '-', 'chief', 'Zhu', 'De', 'and', 'Vice', 'Commander', 'Peng', 'Dehuai', 'of', 'the', 'Eighth', 'Route', 'Army', 'received', 'a', 'top', '-', 'secret', 'telegram', 'from', 'Commander', 'Lu', 'Zhengcao', 'of', 'the', 'Jizhong', 'Military', 'District', ',', 'among', 'other', 'people', '.', 'The', 'telegram', 'said', 'that', 'the', 'Japanese', 'troops', 'were', 'building', 'blockade', 'trenches', 'and', 'chessboard', '-', 'like', 'roads', 'to', 'divide', 'the', 'Jizhong', 'base', 'area', 'into', 'small', 'isolated', 'blocks', 'without', 'the', 'ability', 'to', 'mutually', 'communicate', 'and', 'support', 'each', 'other', ',', 'causing', 'the', 'Eighth', 'Route', 'Army', 'and', 'the', 'guerrillas', 'to', 'lose', 'maneuverability', '.', 'Before', 'the', 'Hundred', 'Regiments', 'Offensive', 'in', '1940', ',', 'an', 'inclination', 'to', 'compromise', ',', 'ah', ',', 'surrender', ',', 'was', 'an', 'extremely', 'serious', 'crisis', 'in', 'the', 'frontline', 'situation', 'in', 'China', '.', 'Well', ',', 'on', 'the', 'battlefield', 'behind', 'enemy', 'lines', ',', 'in', 'order', 'to', 'take', 'over', ',', 'consolidate', 'the', 'area', 'under', 'its', 'occupation', ',', 'Japan', 'began', 'a', 'new', 'strategy', '.', 'That', 'was', 'to', 'use', 'railways', 'as', 'a', 'pillar', ',', 'roads', 'as', 'a', 'chain', ',', 'and', 'strongholds', 'as', 'a', 'lock', ',', 'to', 'carry', 'out', 'siege', 'warfare', 'in', 'an', 'attempt', 'to', 'divide', 'the', 'base', 'areas', 'behind', 'enemy', 'lines', ',', 'ah', ',', 'so', 'as', ',', 'er', ',', 'to', 'cut', 'off', 'their', 'communication', 'with', 'one', 'another', '.', 'In', 'addition', ',', 'it', 'relied', 'on', 'this', 'cage', ',', 'ah', ',', 'to', 'further', 'strengthen', 'its', 'assaults', 'against', 'the', 'base', 'areas', '.', 'Er', '.', 'So', ',', 'it', 'was', 'amidst', 'such', 'a', 'grave', 'international', 'and', 'domestic', 'situation', 'that', 'the', 'Eighth', 'Route', 'Army', 'led', 'by', 'the', 'Chinese', 'Communist', 'Party', ',', 'ah', ',', 'launched', ',', 'ah', ',', 'a', 'strategic', 'offensive', 'called', 'the', 'Hundred', 'Regiments', 'Offensive', '.', 'This', 'plot', 'of', 'the', 'Japanese', 'army', 'drew', 'great', 'attention', 'from', 'Zhu', 'De', 'and', 'Peng', 'Dehuai', 'of', 'Eighth', 'Route', 'Army', 'headquarters', '.', 'After', 'meticulous', 'studies', 'and', 'painstaking', 'preparations', 'by', 'many', 'parties', ',', 'a', 'battle', 'plan', 'based', 'on', 'surprise', 'was', 'formulated', '.', 'On', 'July', '22', ',', '1940', ',', 'a', 'campaign', 'preparation', 'order', 'to', 'attack', 'the', 'Zhengtai', 'Railway', ',', 'jointly', 'signed', 'by', 'Zhu', 'De', ',', 'Peng', 'Dehuai', ',', 'and', 'Zuo', 'Quan', ',', 'was', 'sent', 'to', "Yan'an", 'and', 'all', 'units', 'of', 'the', 'Eighth', 'Route', 'Army', '.', 'What', 'was', 'the', ',', 'purpose', 'and', 'goal', 'of', 'this', 'campaign', '?', 'It', 'was', 'to', 'break', 'through', 'the', 'Japanese', 'army', "'s", 'siege', 'policy', 'against', 'base', 'areas', 'behind', 'enemy', 'lines', ',', 'and', 'to', 'avert', 'the', 'crisis', 'of', 'China', "'s", 'compromise', 'and', 'surrender', '.', 'It', 'was', 'to', 'overcome', 'this', 'crisis', '.', 'Well', ',', 'the', 'Hundred', 'Regiments', 'Offensive', 'was', 'divided', 'into', 'three', 'phases', '.', 'Beginning', 'from', 'August', '20', ',', 'from', 'August', '20', 'to', 'September', '10', ',', 'the', 'main', 'purpose', 'of', 'the', 'campaign', 'was', 'to', 'sabotage', 'the', 'Zhengtai', 'Railway', '.'], 'sent_id': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22], 'part_id': 1, 'speaker': ['Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Speaker#1', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1', 'Luo_huanzhang,Speaker#1'], 'pos': ['IN', 'CD', ',', 'NNP', 'POS', 'NNP', 'IN', 'NN', 'IN', 'NNP', 'VBD', 'VBN', 'DT', 'NN', '.', 'DT', 'NN', 'IN', 'PRP$', 'NN', 'CC', 'DT', 'NN', 'POS', 'NN', 'VBD', 'VBN', '.', 'DT', 'NNP', 'NNP', 'NNP', 'NNS', 'VBD', 'RB', 'JJ', ',', 'VBG', 'JJR', 'CC', 'JJR', 'NN', 'IN', 'DT', 'JJ', 'NN', 'IN', 'NNP', 'NNP', '.', 'NNP', 'NNP', ',', 'NN', 'IN', 'DT', 'NNP', 'NNP', 'NNP', 'NNP', 'NNP', ',', 'VBD', 'DT', 'NN', 'IN', 'NN', 'NN', 'TO', 'VB', 'IN', 'DT', 'NNP', 'NNP', 'NNP', '.', 'DT', 'JJ', 'NN', 'VBD', 'VBG', 'DT', 'RB', 'VBN', 'NN', 'NN', ',', 'IN', 'DT', 'NN', 'IN', 'DT', 'NN', 'CC', 'NN', 'NNS', 'IN', 'DT', 'NN', '.', 'NNS', 'CC', 'NNS', 'VBD', 'VBN', 'IN', 'NNS', 'TO', 'VB', 'DT', 'IN', 'NNP', 'NNP', 'IN', 'DT', 'JJ', ',', 'JJ', 'NN', ',', 'IN', 'NN', 'TO', 'VB', 'DT', 'NNP', 'NNP', 'NNP', 'CC', 'PRP$', 'NN', 'NNS', 'IN', 'DT', 'NN', '.', 'IN', 'NN', 'IN', 'DT', 'JJ', 'NN', 'POS', 'NN', 'IN', 'NN', 'NN', ',', 'NNS', 'CC', 'NNS', 'VBD', 'RB', 'VBN', 'DT', 'JJ', 'NN', 'POS', 'NNS', 'IN', 'NN', ',', 'VBG', 'DT', 'JJ', 'NN', 'IN', 'DT', 'NN', 'NNS', '.', 'IN', 'NNP', 'CD', ',', 'NNP', 'HYPH', 'IN', 'HYPH', 'NNP', 'NNP', 'NNP', 'CC', 'NNP', 'NNP', 'NNP', 'NNP', 'IN', 'DT', 'NNP', 'NNP', 'NNP', 'VBD', 'DT', 'JJ', 'HYPH', 'NN', 'NN', 'IN', 'NNP', 'NNP', 'NNP', 'IN', 'DT', 'NNP', 'NNP', 'NNP', ',', 'IN', 'JJ', 'NNS', '.', 'DT', 'NN', 'VBD', 'IN', 'DT', 'JJ', 'NNS', 'VBD', 'VBG', 'NN', 'NNS', 'CC', 'NN', 'HYPH', 'JJ', 'NNS', 'TO', 'VB', 'DT', 'NNP', 'NN', 'NN', 'IN', 'JJ', 'VBN', 'NNS', 'IN', 'DT', 'NN', 'TO', 'RB', 'VB', 'CC', 'VB', 'DT', 'JJ', ',', 'VBG', 'DT', 'NNP', 'NNP', 'NNP', 'CC', 'DT', 'NNS', 'TO', 'VB', 'NN', '.', 'IN', 'DT', 'NNP', 'NNPS', 'NNP', 'IN', 'CD', ',', 'DT', 'NN', 'TO', 'VB', ',', 'UH', ',', 'VB', ',', 'VBD', 'DT', 'RB', 'JJ', 'NN', 'IN', 'DT', 'NN', 'NN', 'IN', 'NNP', '.', 'UH', ',', 'IN', 'DT', 'NN', 'IN', 'NN', 'NNS', ',', 'IN', 'NN', 'TO', 'VB', 'RP', ',', 'VB', 'DT', 'NN', 'IN', 'PRP$', 'NN', ',', 'NNP', 'VBD', 'DT', 'JJ', 'NN', '.', 'DT', 'VBD', 'TO', 'VB', 'NNS', 'IN', 'DT', 'NN', ',', 'NNS', 'IN', 'DT', 'NN', ',', 'CC', 'NNS', 'IN', 'DT', 'NN', ',', 'TO', 'VB', 'RP', 'NN', 'NN', 'IN', 'DT', 'NN', 'TO', 'VB', 'DT', 'NN', 'NNS', 'IN', 'NN', 'NNS', ',', 'UH', ',', 'RB', 'IN', ',', 'UH', ',', 'TO', 'VB', 'RP', 'PRP$', 'NN', 'IN', 'CD', 'DT', '.', 'IN', 'NN', ',', 'PRP', 'VB', 'IN', 'DT', 'NN', ',', 'UH', ',', 'TO', 'RBR', 'VB', 'PRP$', 'NNS', 'IN', 'DT', 'NN', 'NNS', '.', 'UH', '.', 'CC', ',', 'PRP', 'VBD', 'IN', 'PDT', 'DT', 'JJ', 'JJ', 'CC', 'JJ', 'NN', 'IN', 'DT', 'NNP', 'NNP', 'NNP', 'VBN', 'IN', 'DT', 'JJ', 'NNP', 'NNP', ',', 'UH', ',', 'VBD', ',', 'UH', ',', 'DT', 'JJ', 'NN', 'VBN', 'DT', 'NNP', 'NNPS', 'NNP', '.', 'DT', 'NN', 'IN', 'DT', 'JJ', 'NN', 'VBD', 'JJ', 'NN', 'IN', 'NNP', 'NNP', 'CC', 'NNP', 'NNP', 'IN', 'NNP', 'NNP', 'NNP', 'NN', '.', 'IN', 'JJ', 'NNS', 'CC', 'JJ', 'NNS', 'IN', 'JJ', 'NNS', ',', 'DT', 'NN', 'NN', 'VBN', 'IN', 'NN', 'VBD', 'VBN', '.', 'IN', 'NNP', 'CD', ',', 'CD', ',', 'DT', 'NN', 'NN', 'NN', 'TO', 'VB', 'DT', 'NNP', 'NNP', ',', 'RB', 'VBN', 'IN', 'NNP', 'NNP', ',', 'NNP', 'NNP', ',', 'CC', 'NNP', 'NNP', ',', 'VBD', 'VBN', 'IN', 'NNP', 'CC', 'DT', 'NNS', 'IN', 'DT', 'NNP', 'NNP', 'NNP', '.', 'WP', 'VBD', 'DT', ',', 'NN', 'CC', 'NN', 'IN', 'DT', 'NN', '.', 'PRP', 'VBD', 'TO', 'VB', 'IN', 'DT', 'JJ', 'NN', 'POS', 'NN', 'NN', 'IN', 'NN', 'NNS', 'IN', 'NN', 'NNS', ',', 'CC', 'TO', 'VB', 'DT', 'NN', 'IN', 'NNP', 'POS', 'NN', 'CC', 'NN', '.', 'PRP', 'VBD', 'TO', 'VB', 'DT', 'NN', '.', 'UH', ',', 'DT', 'NNP', 'NNPS', 'NNP', 'VBD', 'VBN', 'IN', 'CD', 'NNS', '.', 'VBG', 'IN', 'NNP', 'CD', ',', 'IN', 'NNP', 'CD', 'IN', 'NNP', 'CD', ',', 'DT', 'JJ', 'NN', 'IN', 'DT', 'NN', 'VBD', 'TO', 'VB', 'DT', 'NNP', 'NNP', '.'], 'deprel': ['prep', 'pobj', 'punct', 'poss', 'possessive', 'nsubj', 'prep', 'pobj', 'prep', 'pobj', 'aux', 'root', 'det', 'dobj', 'punct', 'det', 'nsubjpass', 'prep', 'poss', 'pobj', 'cc', 'det', 'poss', 'possessive', 'conj', 'auxpass', 'root', 'punct', 'det', 'nn', 'nn', 'nn', 'nsubj', 'cop', 'advmod', 'root', 'punct', 'xcomp', 'amod', 'cc', 'conj', 'dobj', 'prep', 'det', 'amod', 'pobj', 'prep', 'nn', 'pobj', 'punct', 'nn', 'nsubj', 'punct', 'appos', 'prep', 'det', 'nn', 'nn', 'nn', 'nn', 'pobj', 'punct', 'root', 'det', 'dobj', 'prep', 'nn', 'pobj', 'aux', 'xcomp', 'prep', 'det', 'nn', 'nn', 'pobj', 'punct', 'det', 'amod', 'nsubj', 'aux', 'root', 'det', 'advmod', 'amod', 'nn', 'dobj', 'punct', 'prep', 'det', 'pobj', 'prep', 'det', 'pobj', 'cc', 'nn', 'conj', 'prep', 'det', 'pobj', 'punct', 'nsubjpass', 'cc', 'conj', 'auxpass', 'root', 'prep', 'pobj', 'aux', 'infmod', 'dobj', 'prep', 'nn', 'pobj', 'prep', 'det', 'amod', 'punct', 'amod', 'pobj', 'punct', 'prep', 'pobj', 'aux', 'infmod', 'det', 'nn', 'nn', 'dobj', 'cc', 'poss', 'nn', 'conj', 'prep', 'det', 'pobj', 'punct', 'prep', 'pobj', 'prep', 'det', 'amod', 'poss', 'possessive', 'pobj', 'prep', 'nn', 'pobj', 'punct', 'nsubj', 'cc', 'conj', 'aux', 'advmod', 'cop', 'det', 'amod', 'poss', 'possessive', 'root', 'prep', 'pobj', 'punct', 'partmod', 'det', 'amod', 'xcomp', 'prep', 'det', 'nn', 'pobj', 'punct', 'prep', 'pobj', 'num', 'punct', 'nn', 'punct', 'prep', 'punct', 'pobj', 'nn', 'nsubj', 'cc', 'nn', 'nn', 'nn', 'conj', 'prep', 'det', 'nn', 'nn', 'pobj', 'root', 'det', 'amod', 'punct', 'nn', 'dobj', 'prep', 'nn', 'nn', 'pobj', 'prep', 'det', 'nn', 'nn', 'pobj', 'punct', 'prep', 'amod', 'pobj', 'punct', 'det', 'nsubj', 'root', 'mark', 'det', 'amod', 'nsubj', 'aux', 'ccomp', 'nn', 'dobj', 'cc', 'npadvmod', 'punct', 'amod', 'conj', 'aux', 'xcomp', 'det', 'nn', 'nn', 'dobj', 'prep', 'amod', 'amod', 'pobj', 'prep', 'det', 'pobj', 'aux', 'advmod', 'infmod', 'cc', 'conj', 'det', 'dobj', 'punct', 'partmod', 'det', 'nn', 'nn', 'nsubj', 'cc', 'det', 'conj', 'aux', 'xcomp', 'dobj', 'punct', 'prep', 'det', 'nn', 'nn', 'pobj', 'prep', 'pobj', 'punct', 'det', 'nsubj', 'aux', 'infmod', 'punct', 'discourse', 'punct', 'dep', 'punct', 'cop', 'det', 'advmod', 'amod', 'root', 'prep', 'det', 'nn', 'pobj', 'prep', 'pobj', 'punct', 'discourse', 'punct', 'prep', 'det', 'pobj', 'prep', 'nn', 'pobj', 'punct', 'prep', 'pobj', 'aux', 'infmod', 'prt', 'punct', 'dep', 'det', 'dobj', 'prep', 'poss', 'pobj', 'punct', 'nsubj', 'root', 'det', 'amod', 'dobj', 'punct', 'nsubj', 'root', 'aux', 'xcomp', 'dobj', 'prep', 'det', 'pobj', 'punct', 'conj', 'prep', 'det', 'pobj', 'punct', 'cc', 'conj', 'prep', 'det', 'pobj', 'punct', 'aux', 'conj', 'prt', 'nn', 'dobj', 'prep', 'det', 'pobj', 'aux', 'infmod', 'det', 'nn', 'dobj', 'prep', 'nn', 'pobj', 'punct', 'discourse', 'punct', 'advmod', 'mark', 'punct', 'discourse', 'punct', 'aux', 'advcl', 'prt', 'poss', 'dobj', 'prep', 'pobj', 'dep', 'punct', 'prep', 'pobj', 'punct', 'nsubj', 'root', 'prep', 'det', 'pobj', 'punct', 'discourse', 'punct', 'aux', 'advmod', 'xcomp', 'poss', 'dobj', 'prep', 'det', 'nn', 'pobj', 'punct', 'root', 'punct', 'cc', 'punct', 'nsubj', 'root', 'prep', 'predet', 'det', 'amod', 'amod', 'cc', 'conj', 'pobj', 'advmod', 'det', 'nn', 'nn', 'nsubj', 'partmod', 'prep', 'det', 'amod', 'nn', 'pobj', 'punct', 'discourse', 'punct', 'dep', 'punct', 'discourse', 'punct', 'det', 'amod', 'dobj', 'partmod', 'det', 'nn', 'nn', 'dep', 'punct', 'det', 'nsubj', 'prep', 'det', 'amod', 'pobj', 'root', 'amod', 'dobj', 'prep', 'nn', 'pobj', 'cc', 'nn', 'conj', 'prep', 'nn', 'nn', 'nn', 'pobj', 'punct', 'prep', 'amod', 'pobj', 'cc', 'amod', 'conj', 'prep', 'amod', 'pobj', 'punct', 'det', 'nn', 'nsubjpass', 'partmod', 'prep', 'pobj', 'auxpass', 'root', 'punct', 'prep', 'pobj', 'num', 'punct', 'num', 'punct', 'det', 'nn', 'nn', 'nsubjpass', 'aux', 'infmod', 'det', 'nn', 'dobj', 'punct', 'advmod', 'partmod', 'prep', 'nn', 'pobj', 'punct', 'nn', 'conj', 'punct', 'cc', 'nn', 'conj', 'punct', 'auxpass', 'root', 'prep', 'pobj', 'cc', 'det', 'conj', 'prep', 'det', 'nn', 'nn', 'pobj', 'punct', 'root', 'cop', 'nsubj', 'punct', 'conj', 'cc', 'conj', 'prep', 'det', 'pobj', 'punct', 'nsubj', 'root', 'aux', 'ccomp', 'prep', 'det', 'amod', 'poss', 'possessive', 'nn', 'pobj', 'prep', 'nn', 'pobj', 'prep', 'nn', 'pobj', 'punct', 'cc', 'aux', 'conj', 'det', 'dobj', 'prep', 'poss', 'possessive', 'pobj', 'cc', 'conj', 'punct', 'nsubj', 'root', 'aux', 'xcomp', 'det', 'dobj', 'punct', 'discourse', 'punct', 'det', 'nn', 'nn', 'nsubjpass', 'auxpass', 'root', 'prep', 'num', 'pobj', 'punct', 'dep', 'prep', 'pobj', 'num', 'punct', 'prep', 'pobj', 'num', 'prep', 'pobj', 'num', 'punct', 'det', 'amod', 'nsubj', 'prep', 'det', 'pobj', 'root', 'aux', 'xcomp', 'det', 'nn', 'dobj', 'punct'], 'head': [11, 0, 11, 5, 3, 11, 5, 6, 7, 8, 11, None, 13, 11, 11, 16, 26, 16, 19, 17, 19, 22, 24, 22, 19, 26, None, 26, 32, 30, 31, 32, 35, 35, 35, None, 35, 35, 41, 38, 38, 37, 41, 45, 45, 42, 45, 48, 46, 35, 51, 62, 51, 51, 53, 60, 60, 60, 60, 60, 54, 62, None, 64, 62, 64, 67, 65, 69, 62, 69, 74, 73, 74, 70, 62, 78, 78, 80, 80, None, 85, 83, 85, 85, 80, 85, 85, 89, 87, 89, 92, 90, 89, 95, 89, 95, 98, 96, 80, 104, 100, 100, 104, None, 104, 105, 108, 106, 108, 109, 112, 110, 108, 118, 118, 118, 118, 113, 104, 104, 120, 123, 121, 127, 126, 127, 123, 127, 131, 131, 127, 123, 134, 132, 104, 158, 136, 137, 141, 141, 143, 141, 138, 143, 146, 144, 158, 158, 148, 148, 158, 158, 158, 156, 156, 158, 156, None, 158, 159, 158, 158, 165, 165, 162, 165, 169, 169, 166, 158, 192, 171, 172, 192, 181, 175, 175, 177, 177, 181, 192, 181, 184, 186, 186, 181, 181, 191, 190, 191, 187, None, 197, 196, 196, 197, 192, 192, 201, 201, 198, 201, 206, 206, 206, 202, 201, 201, 210, 208, 192, 213, 214, None, 220, 218, 218, 220, 220, 214, 222, 220, 222, 226, 226, 227, 222, 229, 220, 233, 233, 233, 229, 229, 237, 237, 234, 237, 240, 238, 243, 243, 240, 243, 243, 247, 245, 229, 229, 253, 252, 253, 258, 253, 256, 253, 258, 249, 258, 214, 282, 265, 264, 265, 261, 265, 266, 282, 270, 282, 272, 270, 272, 272, 272, 272, 282, 282, 282, 281, 282, None, 282, 286, 286, 283, 286, 287, 282, 313, 313, 313, 294, 292, 294, 297, 295, 313, 313, 299, 302, 300, 302, 302, 302, 307, 305, 305, 310, 308, 313, 313, None, 316, 316, 313, 313, 319, None, 321, 319, 321, 321, 325, 323, 321, 321, 327, 330, 328, 321, 321, 321, 333, 336, 334, 321, 339, 321, 339, 342, 339, 339, 345, 343, 347, 345, 350, 350, 347, 350, 353, 351, 347, 347, 347, 363, 363, 363, 363, 363, 363, 347, 363, 366, 363, 366, 367, 368, 319, 375, 371, 375, 375, None, 375, 378, 376, 375, 375, 375, 384, 384, 375, 386, 384, 386, 390, 390, 387, 375, None, 392, 397, 397, 397, None, 397, 405, 405, 405, 405, 402, 402, 398, 420, 410, 409, 410, 420, 410, 411, 416, 416, 416, 412, 420, 420, 420, 397, 420, 420, 420, 426, 426, 420, 426, 431, 430, 431, 427, 397, 434, 439, 434, 438, 438, 435, None, 441, 439, 439, 444, 442, 444, 447, 444, 444, 450, 451, 452, 448, 439, 471, 456, 454, 456, 459, 456, 456, 462, 460, 471, 466, 466, 471, 466, 467, 468, 471, None, 471, 503, 473, 474, 474, 474, 503, 482, 482, 482, 503, 484, 482, 487, 487, 484, 482, 490, 482, 490, 493, 491, 493, 496, 493, 493, 493, 500, 493, 503, 503, None, 503, 504, 505, 508, 505, 508, 513, 512, 513, 509, 503, None, 515, 515, 517, 517, 517, 517, 517, 524, 522, 515, 527, None, 529, 527, 529, 533, 533, 536, 533, 536, 530, 536, 539, 537, 539, 542, 540, 529, 529, 546, 529, 548, 546, 548, 552, 550, 549, 552, 552, 527, 557, None, 559, 557, 561, 559, 557, 570, 570, 568, 567, 568, 570, 570, None, 570, 573, 571, 570, 593, 575, 576, 577, 593, 593, 580, 581, 580, 583, 584, 593, 589, 589, 593, 589, 592, 590, None, 595, 593, 598, 598, 595, 593], 'head2span': [[1, 1, 2], [267, 267, 268], [477, 477, 478], [3, 3, 5], [18, 18, 19], [288, 288, 289], [550, 550, 552], [9, 9, 10], [22, 21, 24], [309, 309, 310], [312, 312, 313], [374, 374, 375], [385, 385, 386], [31, 29, 32], [74, 71, 75], [127, 124, 128], [129, 129, 130], [191, 188, 192], [253, 250, 254], [410, 407, 417], [451, 449, 452], [513, 510, 514], [32, 28, 33], [256, 255, 257], [45, 43, 46], [141, 139, 143], [156, 154, 158], [438, 436, 439], [533, 531, 535], [48, 47, 49], [109, 109, 113], [64, 63, 68], [78, 76, 79], [143, 139, 147], [85, 81, 99], [134, 133, 135], [131, 129, 132], [169, 167, 170], [181, 175, 182], [444, 443, 445], [493, 492, 494], [186, 183, 187], [447, 446, 448], [496, 495, 497], [197, 193, 198], [213, 212, 214], [265, 262, 268], [426, 424, 432], [524, 523, 525], [568, 565, 569], [592, 591, 593], [282, 279, 289], [548, 547, 555], [561, 560, 562], [316, 314, 317], [318, 318, 319], [434, 433, 439], [350, 348, 354], [365, 365, 366], [390, 388, 391], [487, 485, 488], [598, 596, 599], [517, 517, 525], [526, 526, 527], [556, 556, 557], [577, 577, 579], [581, 581, 583]], 'word_clusters': [[1, 267, 477], [3, 18, 288, 550], [9, 22, 309, 312, 374, 385], [31, 74, 127, 129, 191, 253, 410, 451, 513], [32, 256], [45, 141, 156, 438, 533], [48, 109], [64, 78, 143], [85, 134], [131, 169], [181, 444, 493], [186, 447, 496], [197, 213], [265, 426, 524, 568, 592], [282, 548, 561], [316, 318, 434], [350, 365, 390], [487, 598], [517, 526, 556], [577, 581]], 'span_clusters': [[[1, 2], [267, 268], [477, 478]], [[3, 5], [18, 19], [288, 289], [550, 552]], [[9, 10], [21, 24], [309, 310], [312, 313], [374, 375], [385, 386]], [[29, 32], [71, 75], [124, 128], [129, 130], [188, 192], [250, 254], [407, 417], [449, 452], [510, 514]], [[28, 33], [255, 257]], [[43, 46], [139, 143], [154, 158], [436, 439], [531, 535]], [[47, 49], [109, 113]], [[63, 68], [76, 79], [139, 147]], [[81, 99], [133, 135]], [[129, 132], [167, 170]], [[175, 182], [443, 445], [492, 494]], [[183, 187], [446, 448], [495, 497]], [[175, 192], [443, 453]], [[193, 198], [212, 214]], [[262, 268], [424, 432], [523, 525], [565, 569], [591, 593]], [[279, 289], [547, 555], [560, 562]], [[314, 317], [318, 319], [433, 439]], [[348, 354], [365, 366], [388, 391]], [[485, 488], [596, 599]], [[517, 525], [526, 527], [556, 557]], [[577, 579], [581, 583]]]}

[175, 192] in the span_clusters is not presented in the head2span (The most similar one might be [181, 175, 182], and [175, 182] is also a span belong to another coreference cluster). Do you have any idea why this may happen? Is this because the head word overlap, the processed script only keep one span?

@vdobrovolskii
Copy link
Owner

No worries! I am glad this work might be interesting to anyone :)

"head2span" contains tuples [head, span_start, span_end], so indeed the span in question (175, 192) has been filtered out.

I checked the case and (175, 192) is "Commander - in - chief Zhu De and Vice Commander Peng Dehuai of the Eighth Route Army", which is what I mentioned in issue #2: because the span contains two conjuncts ("Commander - in - chief Zhu De" and "Vice Commander Peng Dehuai"), the heads of the left conjunct and the whole phrase are the same.

In such cases the data preparation script just picks the shortest option.

As I have already said, if this is critical, you can try changing the data preparation script to pick something else as span head in such cases, for instance, the conjuction itself ("and" in this example). I think this should work, but it will require retraining the model, becase it has not seen such heads and has not been trained to predict such spans.

@yhcc
Copy link
Author

yhcc commented Nov 14, 2021

So when conducting evaluation, the code will use the span_clusters to evaluate (if the word_level_conll is False). I got it, thank you for your patient reply.

@vdobrovolskii
Copy link
Owner

Yes, during evaluation span_clusters are used, so all the spans that cannot be predicted are treated as false negatives.

Feel free to ask any further questions :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants