mfa for multi speaker. #3

leon2milan · 2022-06-27T09:06:45Z

In the code, group MFA inputs for better parallelism. For multi speaker, it maybe go wrong.
For input g_uang3 zh_ou1 n_v3 d_a4 x_ve2 sh_eng1 d_eng1 sh_an1 sh_i1 l_ian2 s_i4 t_ian1 j_ing3 f_ang1 zh_ao3 d_ao4 i2 s_i4 n_v3 sh_i1.
The TexGrid is

	item [1]:
		class = "IntervalTier"
		name = "words"
		xmin = 0.0
		xmax = 9.4444
		intervals: size = 56
			intervals [1]:
				xmin = 0
				xmax = 0.5700000000000001
				text = ""
			intervals [2]:
				xmin = 0.5700000000000001
				xmax = 0.61
				text = "eng"
			intervals [3]:
				xmin = 0.61
				xmax = 0.79
				text = "s_an1"
			intervals [4]:
				xmin = 0.79
				xmax = 0.89
				text = "eng"
			intervals [5]:
				xmin = 0.89
				xmax = 1.06
				text = "i1"
			intervals [6]:
				xmin = 1.06
				xmax = 1.24
				text = "eng"
			intervals [7]:
				xmin = 1.24
				xmax = 1.3
				text = ""
			intervals [8]:
				xmin = 1.3
				xmax = 1.36
				text = "s_an1"
			intervals [9]:
				xmin = 1.36
				xmax = 1.42
				text = ""
			intervals [10]:
				xmin = 1.42
				xmax = 1.49
				text = "eng"
			intervals [11]:
				xmin = 1.49
				xmax = 1.67
				text = "s_i4"
			intervals [12]:
				xmin = 1.67
				xmax = 1.78
				text = "eng"
			intervals [13]:
				xmin = 1.78
				xmax = 1.91
				text = ""
			intervals [14]:
				xmin = 1.91
				xmax = 1.96
				text = "er4"
			intervals [15]:
				xmin = 1.96
				xmax = 2.06
				text = "eng"
			intervals [16]:
				xmin = 2.06
				xmax = 2.19
				text = ""
			intervals [17]:
				xmin = 2.19
				xmax = 2.35
				text = "i1"
			intervals [18]:
				xmin = 2.35
				xmax = 2.53
				text = "eng"
			intervals [19]:
				xmin = 2.53
				xmax = 3.03
				text = "i1"
			intervals [20]:
				xmin = 3.03
				xmax = 3.42
				text = "eng"
			intervals [21]:
				xmin = 3.42
				xmax = 3.48
				text = "i1"
			intervals [22]:
				xmin = 3.48
				xmax = 3.6
				text = ""
			intervals [23]:
				xmin = 3.6
				xmax = 3.64
				text = "eng"
			intervals [24]:
				xmin = 3.64
				xmax = 3.86
				text = "i1"
			intervals [25]:
				xmin = 3.86
				xmax = 3.99
				text = "eng"
			intervals [26]:
				xmin = 3.99
				xmax = 4.59
				text = ""
			intervals [27]:
				xmin = 4.59
				xmax = 4.869999999999999
				text = "er4"
			intervals [28]:
				xmin = 4.869999999999999
				xmax = 4.9799999999999995
				text = "eng"
			intervals [29]:
				xmin = 4.9799999999999995
				xmax = 5.1899999999999995
				text = "s_i4"
			intervals [30]:
				xmin = 5.1899999999999995
				xmax = 5.34
				text = ""
			intervals [31]:
				xmin = 5.34
				xmax = 5.43
				text = "eng"
			intervals [32]:
				xmin = 5.43
				xmax = 5.6
				text = ""
			intervals [33]:
				xmin = 5.6
				xmax = 5.76
				text = "i1"
			intervals [34]:
				xmin = 5.76
				xmax = 6.279999999999999
				text = "eng"
			intervals [35]:
				xmin = 6.279999999999999
				xmax = 6.359999999999999
				text = "s_an1"
			intervals [36]:
				xmin = 6.359999999999999
				xmax = 6.47
				text = ""
			intervals [37]:
				xmin = 6.47
				xmax = 6.6
				text = "eng"
			intervals [38]:
				xmin = 6.6
				xmax = 6.9399999999999995
				text = "i1"
			intervals [39]:
				xmin = 6.9399999999999995
				xmax = 7.039999999999999
				text = "eng"
			intervals [40]:
				xmin = 7.039999999999999
				xmax = 7.289999999999999
				text = "s_an1"
			intervals [41]:
				xmin = 7.289999999999999
				xmax = 7.369999999999999
				text = "eng"
			intervals [42]:
				xmin = 7.369999999999999
				xmax = 7.6
				text = "s_i4"
			intervals [43]:
				xmin = 7.6
				xmax = 7.699999999999999
				text = "eng"
			intervals [44]:
				xmin = 7.699999999999999
				xmax = 7.869999999999999
				text = ""
			intervals [45]:
				xmin = 7.869999999999999
				xmax = 8.049999999999999
				text = "er4"
			intervals [46]:
				xmin = 8.049999999999999
				xmax = 8.26
				text = ""
			intervals [47]:
				xmin = 8.26
				xmax = 8.299999999999999
				text = "eng"
			intervals [48]:
				xmin = 8.299999999999999
				xmax = 8.36
				text = "s_i4"
			intervals [49]:
				xmin = 8.36
				xmax = 8.389999999999999
				text = ""
			intervals [50]:
				xmin = 8.389999999999999
				xmax = 8.42
				text = "eng"
			intervals [51]:
				xmin = 8.42
				xmax = 8.45
				text = ""
			intervals [52]:
				xmin = 8.45
				xmax = 8.59
				text = "s_an1"
			intervals [53]:
				xmin = 8.59
				xmax = 8.83
				text = ""
			intervals [54]:
				xmin = 8.83
				xmax = 9.1
				text = "eng"
			intervals [55]:
				xmin = 9.1
				xmax = 9.44
				text = "i1"
			intervals [56]:
				xmin = 9.44
				xmax = 9.4444
				text = ""

The text was updated successfully, but these errors were encountered:

yerfor · 2022-06-28T15:55:48Z

The mfa code is borrowed from the official repo of NATSpeech. Unfortunately I have not studied the mfa module in depth. We encourage you to push your commits on multi-speaker chinese corpus. Thanks a lot!

leon2milan · 2022-07-04T02:28:53Z

hello, I push a commit. But I don't know it suits for your planning. The parameter of mfa_group seems unnecessary. It doesn't prompt the train speed compared with wetts(it uses version 2.1.0). Also need add the parameter of spk_name.

yerfor closed this as completed Jul 29, 2022

windowxiaoming mentioned this issue Sep 2, 2022

pinyin preprocess problem #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mfa for multi speaker. #3

mfa for multi speaker. #3

leon2milan commented Jun 27, 2022

yerfor commented Jun 28, 2022

leon2milan commented Jul 4, 2022

mfa for multi speaker. #3

mfa for multi speaker. #3

Comments

leon2milan commented Jun 27, 2022

yerfor commented Jun 28, 2022

leon2milan commented Jul 4, 2022