You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@thompsonb, I'm trying to replicate the work done in your paper, the results in the Table 1 in particular.
How did you convert the format of the dataset that you have in the "bleualign_data" directory to hunalign's ladder-style format?
Is there a script to do that, or you did it manually?
The text was updated successfully, but these errors were encountered:
I converted from hunalign ladder-style to the bleualign format. I believe this is the code I used:
def reformat(ladder_file, src_len, tgt_len):
alignments = []
current_alignment = ([], [])
prev_a1, prev_a2 = None, None
for line in open(ladder_file, 'r', encoding='utf-8'):
fields = line.strip().split('\t')
a1, a2 = int(fields[0]), int(fields[1])
if a1 != prev_a1 and a2 != prev_a2 and current_alignment != ([], []):
alignments.append(current_alignment)
current_alignment = ([], [])
current_alignment[0].append(a1)
current_alignment[1].append(a2)
prev_a1, prev_a2 = a1, a2
alignments2 = []
xx, yy = [], []
for a1, a2 in alignments:
x1 = sorted(list(set(a1)))
x2 = sorted(list(set(a2)))
alignments2.append((x1, x2)) # tuple of lists
xx.extend(x1)
yy.extend(x2)
# add deletions/insertions (*not* in order)
xx, yy = set(xx), set(yy)
for x in range(src_len):
if x not in xx:
alignments2.append(([x, ], []))
for y in range(tgt_len):
if y not in yy:
alignments2.append(([], [y, ]))
return alignments2
@thompsonb, I'm trying to replicate the work done in your paper, the results in the Table 1 in particular.
How did you convert the format of the dataset that you have in the "bleualign_data" directory to hunalign's ladder-style format?
Is there a script to do that, or you did it manually?
The text was updated successfully, but these errors were encountered: