Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comments which set emacs variables #77

Conversation

zooko
Copy link
Member

@zooko zooko commented Dec 17, 2013

This makes it so that emacs knows the intended character encoding, BOM,
end-of-line markers, and standard line-width of these files.

Also this is a form of documentation. It means that you should put only
utf-8-encoded things into text files, only utf-8-encoded things into source
code files (and actually you should write only put ASCII-encoded things except
possibly in comments or docstrings!), and that you should line-wrap everything
at 77 columns wide.

It also specifies that text files should start with a "utf-8 BOM". (Brian
questions the point of this, and my answer is that it adds information and
doesn't hurt. Whether that information will ever be useful is an open
question.)

It also specifies that text files should have unix-style ('\n') end-of-line
markers, not windows-style or old-macos-style.

I generated this patch by writing and running the following script, and then
reading the resulting diff to make sure it was correct. I then undid the
changes that the script had done to the files inside the
"setuptools-0.6c16dev4.egg" directory before committing the patch.

------- begin appended script::

import os

magic_header_line_comment_prefix = {
    '.py': u"# ",
    '.rst': u".. ",
    }

def format():
    for dirpath, dirnames, filenames in os.walk('.'):
        for filename in filenames:
            ext = os.path.splitext(filename)[-1]
            if ext in ('.py', '.rst'):
                fname = os.path.join(dirpath, filename)
                info = open(fname, 'rU')
                formattedlines = [ line.decode('utf-8') for line in info ]
                info.close()

                if len(formattedlines) == 0:
                    return

                outfo = open(fname, 'w')
                outfo.write(u"\ufeff".encode('utf-8'))

                commentsign = magic_header_line_comment_prefix[ext]

                firstline = formattedlines.pop(0)
                while firstline.startswith(u"\ufeff"):
                    firstline = firstline[len(u"\ufeff"):]
                if firstline.startswith(u"#!"):
                    outfo.write(firstline.encode('utf-8'))
                    outfo.write(commentsign.encode('utf-8'))
                    outfo.write("-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n".encode('utf-8'))
                else:
                    outfo.write(commentsign.encode('utf-8'))
                    outfo.write("-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n".encode('utf-8'))
                    if (commentsign in firstline) and ("-*-" in firstline) and ("coding:" in firstline):
                        print "warning there was already a coding line %r in %r"  % (firstline, fname)
                    else:
                        outfo.write(firstline.encode('utf-8'))

                for l in formattedlines:
                    if (commentsign in l) and ("-*-" in l) and ("coding:" in l):
                        print "warning there was already a coding line %r in %r"  % (l, fname)
                    else:
                        outfo.write(l.encode('utf-8'))
                outfo.close()

if __name__ == '__main__':
    format()

This makes it so that emacs knows the intended character encoding, BOM,
end-of-line markers, and standard line-width of these files.

Also this is a form of documentation. It means that you should put only
utf-8-encoded things into text files, only utf-8-encoded things into source
code files (and actually you should write only put ASCII-encoded things except
possibly in comments or docstrings!), and that you should line-wrap everything
at 77 columns wide.

It also specifies that text files should start with a "utf-8 BOM". (Brian
questions the point of this, and my answer is that it adds information and
doesn't hurt. Whether that information will ever be useful is an open
question.)

It also specifies that text files should have unix-style ('\n') end-of-line
markers, not windows-style or old-macos-style.

I generated this patch by writing and running the following script, and then
reading the resulting diff to make sure it was correct. I then undid the
changes that the script had done to the files inside the
"setuptools-0.6c16dev4.egg" directory before committing the patch.

------- begin appended script
import os

magic_header_line_comment_prefix = {
    '.py': u"# ",
    '.rst': u".. ",
    }

def format():
    for dirpath, dirnames, filenames in os.walk('.'):
        for filename in filenames:
            ext = os.path.splitext(filename)[-1]
            if ext in ('.py', '.rst'):
                fname = os.path.join(dirpath, filename)
                info = open(fname, 'rU')
                formattedlines = [ line.decode('utf-8') for line in info ]
                info.close()

                if len(formattedlines) == 0:
                    return

                outfo = open(fname, 'w')
                outfo.write(u"\ufeff".encode('utf-8'))

                commentsign = magic_header_line_comment_prefix[ext]

                firstline = formattedlines.pop(0)
                while firstline.startswith(u"\ufeff"):
                    firstline = firstline[len(u"\ufeff"):]
                if firstline.startswith(u"#!"):
                    outfo.write(firstline.encode('utf-8'))
                    outfo.write(commentsign.encode('utf-8'))
                    outfo.write("-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n".encode('utf-8'))
                else:
                    outfo.write(commentsign.encode('utf-8'))
                    outfo.write("-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n".encode('utf-8'))
                    if (commentsign in firstline) and ("-*-" in firstline) and ("coding:" in firstline):
                        print "warning there was already a coding line %r in %r"  % (firstline, fname)
                    else:
                        outfo.write(firstline.encode('utf-8'))

                for l in formattedlines:
                    if (commentsign in l) and ("-*-" in l) and ("coding:" in l):
                        print "warning there was already a coding line %r in %r"  % (l, fname)
                    else:
                        outfo.write(l.encode('utf-8'))
                outfo.close()

if __name__ == '__main__':
    format()
@zooko
Copy link
Member Author

zooko commented Dec 17, 2013

wait a minute ... don't merge this yet...

@zooko zooko closed this Dec 17, 2013
@daira
Copy link
Member

daira commented Dec 19, 2013

I don't agree with this patch (either with using a fill column of 77 for Python source, or with adding UTF-8 BOMs to Python source, especially for files that are ASCII-only). Can we have a ticket where we can discuss and review this properly?

@daira
Copy link
Member

daira commented Dec 19, 2013

Oh, there is a ticket (#2138). Sorry, I'll comment there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants