read.structure has problems with names that start with a number #160

romunov · 2016-11-28T14:27:33Z

Elizabeth posted a question she had with importing a structure file. Here is a reproducible example. Samples C_KH1059 and K_KH1834 are expected to have alleles of .33 for locus 1401_25, but are actually NA.

library(adegenet)

cat(print("		1_25	8_54	1358_15	1363_12	1368_57	1369_41	1372_14	1373_9	1377_42	1378_53	1379_10	1382_37	1386_27	1398_46	1400_9	1401_25	1403_13	1404_17	1409_42	1416_48	1419_11	1421_14	1423_5	1424_74	1426_55	1429_46	1432_23	1435_30	1436_7	1438_9	1443_37
A_KH1584	A	1	4	4	1	1	3	2	4	4	2	3	3	2	4	1	3	1	1	2	3	1	4	4	3	2	2	3	4	4	4	2
A_KH1584	A	1	4	4	1	1	3	2	4	4	4	3	3	4	4	1	3	1	3	2	3	3	4	4	3	4	2	3	4	4	4	2
C_KH1059	C	0	4	4	1	1	3	2	4	4	2	1	3	2	4	1	3	1	3	2	3	3	2	4	3	2	2	3	2	4	4	2
C_KH1059	C	0	4	4	1	1	3	2	4	4	4	3	3	2	4	1	3	1	3	2	3	3	4	4	3	2	2	3	4	4	4	2
M_KH1834	M	0	2	2	1	1	3	2	4	4	2	3	3	2	4	1	3	1	1	2	3	3	4	4	3	2	2	3	2	4	4	2
M_KH1834	M	0	4	4	1	3	3	2	4	4	2	3	3	2	4	1	3	1	3	2	3	3	4	4	3	2	4	3	4	4	4	2
M_KH1837	M	1	4	4	1	1	3	2	4	4	0	3	3	2	2	1	3	1	1	2	3	3	4	4	3	4	2	3	4	4	4	2
M_KH1837	M	1	4	4	1	3	3	2	4	4	0	3	3	4	4	1	3	1	3	2	3	3	4	4	3	4	2	3	4	4	4	2"), 
    file = "elizabeth_starts_with_number.stru")

xy1 <- read.structure("elizabeth_starts_with_number.stru", NA.char="0",
                     n.ind = 4, n.loc = 31, onerowperind = FALSE,
                     col.lab = 1, col.pop = 2, row.marknames = 1,
                     sep = "\t", col.others = 0)

x1 <- tab(xy1)

# 1401_25.33 are NA but should be 1 for samples C_KH1059 M_KH1834
x1[, grepl("1401_25", colnames(x1)), drop = FALSE]

#          1401_25.33
# A_KH1584          1
# C_KH1059         NA
# M_KH1834         NA
# M_KH1837          1

If I rename the column names so that they start with a letter, things pick up.

cat(print("		X1_25	X8_54	X1358_15	X1363_12	X1368_57	X1369_41	X1372_14	X1373_9	X1377_42	X1378_53	X1379_10	X1382_37	X1386_27	X1398_46	X1400_9	X1401_25	X1403_13	X1404_17	X1409_42	X1416_48	X1419_11	X1421_14	X1423_5	X1424_74	X1426_55	X1429_46	X1432_23	X1435_30	X1436_7	X1438_9	X1443_37
A_KH1584	A	1	4	4	1	1	3	2	4	4	2	3	3	2	4	1	3	1	1	2	3	1	4	4	3	2	2	3	4	4	4	2
A_KH1584	A	1	4	4	1	1	3	2	4	4	4	3	3	4	4	1	3	1	3	2	3	3	4	4	3	4	2	3	4	4	4	2
C_KH1059	C	0	4	4	1	1	3	2	4	4	2	1	3	2	4	1	3	1	3	2	3	3	2	4	3	2	2	3	2	4	4	2
C_KH1059	C	0	4	4	1	1	3	2	4	4	4	3	3	2	4	1	3	1	3	2	3	3	4	4	3	2	2	3	4	4	4	2
M_KH1834	M	0	2	2	1	1	3	2	4	4	2	3	3	2	4	1	3	1	1	2	3	3	4	4	3	2	2	3	2	4	4	2
M_KH1834	M	0	4	4	1	3	3	2	4	4	2	3	3	2	4	1	3	1	3	2	3	3	4	4	3	2	4	3	4	4	4	2
M_KH1837	M	1	4	4	1	1	3	2	4	4	0	3	3	2	2	1	3	1	1	2	3	3	4	4	3	4	2	3	4	4	4	2
M_KH1837	M	1	4	4	1	3	3	2	4	4	0	3	3	4	4	1	3	1	3	2	3	3	4	4	3	4	2	3	4	4	4	2"), 
    file = "elizabeth_starts_with_letter.stru")

xy2 <- read.structure("elizabeth_starts_with_letter.stru", NA.char="0",
                     n.ind = 4, n.loc = 31, onerowperind = FALSE,
                     col.lab = 1, col.pop = 2, row.marknames = 1,
                     sep = "\t", col.others = 0)

x2 <- tab(xy2)
x2[, grepl("1401_25", colnames(x2)), drop = FALSE]

#          X1401_25.33
# A_KH1584           1
# C_KH1059           1
# M_KH1834           1
# M_KH1837           1

unlink("elizabeth_starts_with_letter.stru")
unlink("elizabeth_starts_with_number.stru")

The text was updated successfully, but these errors were encountered:

thibautjombart · 2016-12-02T15:53:36Z

OK thanks for looking into this. I am struggling to catch up with things so won't have time to look into it, though the fix is probably an easy one.. :-/

romunov · 2016-12-02T16:47:39Z

I'll have a look ASAP.

thibautjombart · 2016-12-02T17:28:42Z

You rock!

romunov · 2016-12-04T11:39:56Z

The problem was in df2genind in line 322 because of partial matching some other loci became eligible to become NA candidates. For instance, locus 1_25 would also match with 1401_25. I added a more explicit matching by forcing ^ in front of the string to make sure start of the string is also matched. By adding a letter to the beginning of the locus name, this matching worked as expected. I also added a test case for this.

thibautjombart · 2016-12-05T12:04:13Z

This is great, I especially like the added test, thanks for this!

romunov changed the title ~~read.structure has problems with names that can be coerced to numeric~~ read.structure has problems with names that start with a number Nov 28, 2016

romunov closed this as completed in 865db3d Dec 4, 2016

zkamvar mentioned this issue Apr 30, 2019

unsettling discrepancy in allele identity between original datafile and converted genind object #256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read.structure has problems with names that start with a number #160

read.structure has problems with names that start with a number #160

romunov commented Nov 28, 2016 •

edited

Loading

thibautjombart commented Dec 2, 2016

romunov commented Dec 2, 2016

thibautjombart commented Dec 2, 2016

romunov commented Dec 4, 2016

thibautjombart commented Dec 5, 2016

read.structure has problems with names that start with a number #160

read.structure has problems with names that start with a number #160

Comments

romunov commented Nov 28, 2016 • edited Loading

thibautjombart commented Dec 2, 2016

romunov commented Dec 2, 2016

thibautjombart commented Dec 2, 2016

romunov commented Dec 4, 2016

thibautjombart commented Dec 5, 2016

romunov commented Nov 28, 2016 •

edited

Loading