# metadata reorder

_re-order metadata columns in csv_  
2016-10-24 Jeremy Douglass / WE1S

Solution notes:

-  http://stackoverflow.com/questions/33001490/python-re-ordering-columns-in-a-csv
-  http://stackoverflow.com/questions/16306819/python-edit-csv-headers

## file names

In [None]:
import csv

csv_in  = "metadata/metadata.csv"
csv_out = "metadata/metadata-dfrb.csv"

## column names

In [None]:
## infieldnames provides new names for original column order
## outfieldnames re-orders infieldnames for new column order

infieldnames = 'id', 'journaltitle', 'pubdate', 'title', 'pagerange', 'author', 'volume', 'issue'
outfieldnames = 'id', 'title', 'author', 'journaltitle', 'volume', 'issue', 'pubdate', 'pagerange'

## copy metadata

In [None]:
## create reordered metadata file

with open(csv_in, 'r') as infile, open(csv_out, 'a') as outfile:
    ## input dict needs a list for column renaming
    reader = csv.DictReader(infile, fieldnames=infieldnames)
    ## skip outdated header row
    next(reader, None)

    ## output dict needs a reordered list for new column ordering
    writer = csv.DictWriter(outfile, fieldnames=outfieldnames)
    ## write automatic header
    writer.writeheader()
    
    ## write each row to new file with remapped column order
    for row in reader:
        writer.writerow(row)

---------------



---------------

## NOTES

	IN  = id, publication, pubdate, title, articlebody, author, docUrl, wordcount

Specify new column names for load (ignore old names):

> OLD NAME => NEW NAME
>  
> -  id => id  
> -  publication => journaltitle  
> -  pubdate => pubdate  
> -  title => title  
> -  articlebody => pagerange  
> -  author => author  
> -  docUrl => volume  
> -  wordcount => issue  

	infieldnames = 'id', 'journaltitle', 'pubdate', 'title', 'pagerange', 'author', 'volume', 'issue'

Reorganizing column order for output. Same names in different order; keys will be auto-mapped by DictReader / DictWriter:

> ORDER => REORDER
> 
> -  1 => 1  
> -  2 => 4  
> -  3 => 7  
> -  4 => 2  
> -  5 => 8  
> -  6 => 3  
> -  7 => 5  
> -  8 => 6  

	outfieldnames = 'id', 'title', 'author', 'journaltitle', 'volume', 'issue', 'pubdate', 'pagerange'

### LINKS

-  http://stackoverflow.com/questions/16306819/python-edit-csv-headers   
-  http://stackoverflow.com/questions/17039539/replace-fieldnames-when-using-dictreader
-  http://stackoverflow.com/questions/20347766/pythonically-add-header-to-a-csv-file
-  http://stackoverflow.com/questions/2982023/writing-header-with-dictwriter-from-pythons-csv-module
-  http://stackoverflow.com/questions/35063137/how-to-rename-key-header-in-csv-dictreader
-  http://stackoverflow.com/questions/38089295/python-rename-header-row-after-w-writerow-is-finished
-  https://docs.python.org/2/library/csv.html

## OLD CODE

```python
with open(csv_in, 'r') as infile, open(csv_out, 'a') as outfile:
    # output dict needs a list for new column ordering
    fieldnames = 'id', 'title', 'docUrl', 'publication', 'wordcount', 'pubdate', 'articlebody', 'author'
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    # reorder the header first
    writer.writeheader()
    for row in csv.DictReader(infile):
        # writes the reordered rows to the new file
        writer.writerow(row)
```

## manual line snippet

```
## rename by writing custom header
wtr = csv.writer( outfile )
wtr.writerow(['id', 'title', 'volume', 'journaltitle', 'issue', 'pubdate', 'pagerange', 'author'])
```