Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parties are messy #10

Open
GPHemsley opened this issue Jan 7, 2013 · 15 comments
Open

Parties are messy #10

GPHemsley opened this issue Jan 7, 2013 · 15 comments

Comments

@GPHemsley
Copy link
Member

Much of the party information is flawed or straight-up inaccurate. In addition, it doesn't seem like there has been any attempt to standardize the names—a few names differ only in capitalization (e.g. "Pro-administration" vs. "Pro-Administration").

Here's the full list of parties used:
[
'AL',
'Adams',
'Adams Democrat',
'American',
'American Labor',
'Anti Jackson',
'Anti Jacksonian',
'Anti Mason',
'Anti Masonic',
'Anti-Administration',
'Anti-Jacksonian',
'Anti-Lecompton Democrat',
'Anti-administration',
'Coalitionist',
'Conservative',
'Conservative Republican',
'Constitutional Unionist',
'Crawford Republican',
'Democrat',
'Democrat Farmer Labor',
'Democrat-Liberal',
'Democrat-turned-Republican',
'Democrat/Independent',
'Democrat/Republican',
'Democratic',
'Democratic - Republican',
'Democratic Republican',
'Democratic and Union Labor',
'Democratic-Republican',
'Farmer-Labor',
'Federalist',
'Free Silver',
'Free Soil',
'Ind. Democrat',
'Ind. Republican',
'Ind. Republican-Democrat',
'Ind. Whig',
'Independent',
'Independent Democrat',
'Independent/Republican',
'Jackson',
'Jackson Republican',
'Jacksonian',
'Jacksonian Republican',
'Law and Order',
'Liberal',
'Liberal Republican',
'Liberty',
'National Greenbacker',
'New Progressive',
'Nonpartisan',
'Nullifier',
'Popular Democrat',
'Populist',
'Pro-Administration',
'Pro-administration',
'Progressive',
'Progressive Republican',
'Prohibitionist',
'Readjuster',
'Readjuster Democrat',
'Republican',
'Republican-Conservative',
'Silver',
'Silver Republican',
'Socialist',
'States Rights',
'Unconditional Unionist',
'Union',
'Union Democrat',
'Union Labor',
'Unionist',
'Unknown',
'Whig',
'no party'
]

(Note: Legislators are said to be in the "Democrat" party, while executives are in the "Democratic" party; the latter is the appropriate one.)

I would recommend consolidating some of these, and perhaps having a separate file that maps names to abbreviations, which should be distinct.

In addition, I think that changing parties mid-term should be shown with two terms, but I suppose that's debatable. (Another option is to only list the party at the time of election.) As it stands, when a candidate changes parties mid-term, they get a party like "Democrat/Independent" or similar.

This page has abbreviations for some of the more prominent parties (perhaps just the ones in the Senate), but I don't think it covers all the parties used here:
http://www.senate.gov/artandhistory/history/common/generic/Key_Party_Abbreviations.htm

@konklone
Copy link
Member

konklone commented Jan 7, 2013

While I don't think we need to have a separate mapping file, I would
definitely like to consolidate and correct these - and document somewhere
what the possible parties are (since it's not like they're changing very
fast...).

I don't think it'd be easy to re-determine these en masse, as the
Bioguide's text would be hard to parse with accuracy, especially for people
who switch parties at some point. But we could probably generate some logic
to transform the parties we have - these spellings become this spelling,
anything with a slash that indicates a party change we flag for manual
attention, etc. It'd be a one-time thing too, it wouldn't need maintenance.

You'd be welcome to take a shot at it, and I'm happy to take a shot myself
before long if you don't feel like it.

On Sun, Jan 6, 2013 at 7:42 PM, Gordon P. Hemsley
notifications@github.comwrote:

Much of the party information is flawed or straight-up inaccurate. In
addition, it doesn't seem like there has been any attempt to standardize
the names—a few names differ only in capitalization (e.g.
"Pro-administration" vs. "Pro-Administration").

Here's the full list of parties used:
[
'AL',
'Adams',
'Adams Democrat',
'American',
'American Labor',
'Anti Jackson',
'Anti Jacksonian',
'Anti Mason',
'Anti Masonic',
'Anti-Administration',
'Anti-Jacksonian',
'Anti-Lecompton Democrat',
'Anti-administration',
'Coalitionist',
'Conservative',
'Conservative Republican',
'Constitutional Unionist',
'Crawford Republican',
'Democrat',
'Democrat Farmer Labor',
'Democrat-Liberal',
'Democrat-turned-Republican',
'Democrat/Independent',
'Democrat/Republican',
'Democratic',
'Democratic - Republican',
'Democratic Republican',
'Democratic and Union Labor',
'Democratic-Republican',
'Farmer-Labor',
'Federalist',
'Free Silver',
'Free Soil',
'Ind. Democrat',
'Ind. Republican',
'Ind. Republican-Democrat',
'Ind. Whig',
'Independent',
'Independent Democrat',
'Independent/Republican',
'Jackson',
'Jackson Republican',
'Jacksonian',
'Jacksonian Republican',
'Law and Order',
'Liberal',
'Liberal Republican',
'Liberty',
'National Greenbacker',
'New Progressive',
'Nonpartisan',
'Nullifier',
'Popular Democrat',
'Populist',
'Pro-Administration',
'Pro-administration',
'Progressive',
'Progressive Republican',
'Prohibitionist',
'Readjuster',
'Readjuster Democrat',
'Republican',
'Republican-Conservative',
'Silver',
'Silver Republican',
'Socialist',
'States Rights',
'Unconditional Unionist',
'Union',
'Union Democrat',
'Union Labor',
'Unionist',
'Unknown',
'Whig',
'no party'
]

(Note: Legislators are said to be in the "Democrat" party, while
executives are in the "Democratic" party; the latter is the appropriate
one.)

I would recommend consolidating some of these, and perhaps having a
separate file that maps names to abbreviations, which should be distinct.

In addition, I think that changing parties mid-term should be shown with
two terms, but I suppose that's debatable. (Another option is to only list
the party at the time of election.) As it stands, when a candidate changes
parties mid-term, they get a party like "Democrat/Independent" or similar.

This page has abbreviations for some of the more prominent parties
(perhaps just the ones in the Senate), but I don't think it covers all the
parties used here:

http://www.senate.gov/artandhistory/history/common/generic/Key_Party_Abbreviations.htm


Reply to this email directly or view it on GitHubhttps://github.com//issues/10.

Developer | sunlightfoundation.com

@JoshData
Copy link
Member

JoshData commented Jan 7, 2013

Some of the party info probably came from the bioguide search listing pages: e.g. http://bioguide.congress.gov/biosearch/biosearch1.asp

I may have filled it in with other sources.

We should correct executive.yaml's parties for Democrats to match the others, since the legislators file is in more use than the executive file (no use). 'Democrat' is correct in so far as it is the noun form (e.g. "I am a Democrat." is correct).

In current/recent data, the party for someone who switched is always the most recent party (for that term). I think that's a good rule. We might do that to fix the historical data, and also add a new field like we did for names that links parties to time periods in just the cases where the party changed. But we shouldn't split terms.

@JoshData
Copy link
Member

JoshData commented Jan 7, 2013

Sorry, for the bioguide link, start at the home page http://bioguide.congress.gov/biosearch/biosearch.asp and just choose a state. That's what I meant.

@GPHemsley
Copy link
Member Author

Yeah, the conversion part should be easy and rather straightforward. But first we should sort out what the appropriate names are.

I was under the impression that the party field represents the name of the party, not the noun used to describe a member of that party. (That is, a member of the XYZ Party would be listed as "XYZ", not as an "XYZan".) And if that is not the case, then I am of the opinion that it should be the case. This might be a benefit to using abbreviation codes from a separate file: We can use the abbreviation to point to the full party name (including "Party") or leave it out when appropriate ("unknown", "no party", "independent", etc.). (The other benefit being that we'd have the abbreviation for free if we want to, for example, have a party/state/district tag after the person's name.)

With regard to party switchers, I won't argue with using the party at the end of the term, though I'm not clear what you're referring to wrt "a new field like we did for names that links parties to time periods in just the cases where the party changed". What's an existing example of this that I can look at?

I think the party data is likely for the most part accurate (modulo the other issues discussed here), and I am happy to spot-check ones that seem inaccurate.

@konklone
Copy link
Member

konklone commented Jan 7, 2013

I agree that the party names should be the names of the parties, and not
the descriptor for its members. (Similarly, if we were writing out state
names, we'd say "New York" and not "New Yorker".)

What Josh was referring to, I think, is the other_names we added for
members who change their names, and a date of when they lost that name.
Mary Bono Mack has an example on the
READMEhttps://github.com/unitedstates/congress-legislators#legislators-file-structure-and-overview.
We could make an other_parties array, with the party name and an end date,
if we wanted. If we're not going to split up terms by anything other than
end of session/service, then that's probably a good idea.

I still don't think we need abbreviations, or a conversion file - we just
leave have a standard name for every party, and leave off the word "Party".
I'm open to it if there are reasons that would be incomplete. "Unknown",
"No Party", and "Independent" are fine identifiers for me (though I think
"No Party" and "Independent" mean the same thing and we should just use one
of them).

On Mon, Jan 7, 2013 at 11:51 AM, Gordon P. Hemsley <notifications@github.com

wrote:

Yeah, the conversion part should be easy and rather straightforward. But
first we should sort out what the appropriate names are.

I was under the impression that the party field represents the name of the
party, not the noun used to describe a member of that party. (That is, a
member of the XYZ Party would be listed as "XYZ", not as an "XYZan".) And
if that is not the case, then I am of the opinion that it should be the
case. This might be a benefit to using abbreviation codes from a separate
file: We can use the abbreviation to point to the full party name
(including "Party") or leave it out when appropriate ("unknown", "no
party", "independent", etc.).

With regard to party switchers, I won't argue with using the party at the
end of the term, though I'm not clear what you're referring to wrt "a new
field like we did for names that links parties to time periods in just the
cases where the party changed". What's an existing example of this that I
can look at?

I think the party data is likely for the most part accurate (modulo the
other issues discussed here), and I am happy to spot-check ones that seem
inaccurate.


Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-11959558.

Developer | sunlightfoundation.com

@GPHemsley
Copy link
Member Author

Alright, I'll let you two decide how to handle the party switching issue. Though I should note: sometimes mid-term party switching changes party control in Congress, as it did in the 107th Congress when Jim Jeffords changed from Republican to Independent, so it could be important to note a little further than just an "oh yeah by the way" field.

But I'd like to continue to argue for the party abbreviation mapping. One thing I edited into my last comment (which was you probably didn't get via e-mail) was this:
"The other benefit being that we'd have the abbreviation for free if we want to, for example, have a party/state/district tag after the person's name."

So if someone wanted to display "Chuck Schumer [D-NY]", they'd have all that information for free, without having to reverse-engineer any of it. Similarly, if they wanted to display someone with a more obscure party abbreviation, like "Al Franken [DFL-MN]" or "Joe Lieberman [ID-CT]", they wouldn't have to do anything special—every party would be processed the same.

Along those same lines, the abbreviation/mapping would be a way to differentiate between party and identifier: DFL represents the "Democratic-Farmer-Labor Party", while ID represents "Independent Democrat", which is not actually a party. And "no party" is not strictly the same as "independent": George Washington is (almost?) always described as having no party, not as being an independent.

Wikipedia is an excellent source to get all this (IMO, important) information that the BioGuide might not make clear.

@konklone
Copy link
Member

konklone commented Jan 7, 2013

This is a very good point, I agree that people should have the ability to get those abbreviations (I didn't think that was the kind of abbreviation you meant).

Not to be a stick in the mud about the separate file, but I think we can still achieve this without that, by having it be a second field. In other words, have both "party" and "party_abbreviation" fields. I only continue to push this because having to link files together is pain from a client parsing standpoint, and having separate files to interact with in our scripts is also a maintenance burden. I'd rather keep the data slightly denormalized.

@JoshData
Copy link
Member

JoshData commented Jan 7, 2013

@GPHemsley It's actually even more complex than that. The party that members run under during the election may have no connection to whether they caucus with the Republicans or with the Democrats once elected (especially independents), and it's how they caucus that determines majority/minority control.

I'm not opposed to changing Democrat/Democratic, but please not right now. I'm still trying to catch my breath from last week. Gimme a few weeks.

Normalizing historical party names so that they're at least consistent sounds good to me.

Everything else sounds like you (@GPHemsley) should try it out on a separate branch/fork so we can see what it looks like and what the ramifications are.

@konklone
Copy link
Member

konklone commented Jan 7, 2013

I don't think there's anything urgent here w/r/t to Democratic/Democrat. Though, if we add a party abbreviation field, that would then be the thing to hinge one's logic on going forward, so that name corrections could be made to parties whenever, without breaking anyone's stuff.

@GPHemsley
Copy link
Member Author

I was thinking about this some more, and I think that the benefits of having a separate field would outweigh the costs in the long-term (which doesn't necessarily have to start right this second).

There may be some usecases for having two separate fields and allowing them to vary independently, but I think for 99% of the cases, they would be redundant and would only add to the bulk of the filesize. (And they potentially run the risk of becoming out of sync by accident.)

On the other hand, having the parties be represented by abbreviations or codes means they can easily be used and referenced in multiple different places.

So here's what I'm thinking:

  • Whatever party the person is currently a member of would be the code in the regular party field.
  • If the person has previously been a member of another party, then they also get a list of other_parties that list party code and end date (and maybe start date, if deemed significant or necessary).
  • The party file can then contain information about the party: full name ("Democratic Party"), short name ("Democratic"), what a member is called ("Democrat"), etc. Some of these fields can be optional—with some kind of fallback cascade defined—since an "Independent" or someone who is "Anti-Administration" wouldn't truly have a party name. The file could even contain website information or dates of existence, if desired.
  • As Josh says, the most important party-related information for a particular term is which party the person caucuses with. So if a person changes from Republican to Independent, for example, but continues to caucus with the Republicans, then it isn't necessary to make any difference in notation of the terms. But if a person changes from Republican to Democratic, and thereby changes caucuses, I think it would be worth having an additional term to notate this. (Reverting my previous position of non-engagement.) Similarly, if a third party gains enough traction in the future, we might need a way to notate that a person is a member of the Green Party who caucuses with Democrats, or a member of the Conservative or Libertarian parties who caucuses with Republicans. (It might even be possible that different members of the same party caucus with different parties.) So the abbreviation code would be helpful here, as well.

So that's my thinking on what the way forward should be, but I'm not at all averse to letting things settle down before attempting to make any changes.

@konklone
Copy link
Member

konklone commented Jan 7, 2013

Redundancy and adding to file size aren't a big concern for me (I accept that as a price of denormalization).

I think you're making good points for why having a separate set of metadata around each party would be beneficial. I'm still not there yet on thinking it's worth having a parties.yaml file that needs to be referenced by anyone who wants to parse the contents of legislators-current.yaml file (my preference would even be to merge the -current and -historical files).

There's a tension between the terms list being precise and being understandable. I think there's (at least) 3 reasonable choices:

  • Terms are broken up every time any one of the fields changes - the chamber, the party, the name of the person, maybe even some day their gender (I believe this has happened in other countries). Terms also are broken up by session and by service. The reason for a term ending is noted. This creates a lot of terms, and a lot of reasons. The parsing of this array to precisely answer questions is, for some questions, extremely complicated, and for others, extremely simple.
  • Terms are broken up only by session/service, and show the state at which the person ended the term. (The way things are now.)
  • Two arrays are provided, one that does the first thing (phases? roles?), and one that does the second (terms). This makes answering more questions easy, doesn't make any kinds of questions harder, but is a much bigger maintenance burden (and duplicates a substantial amount of data). It's possible the terms array in this situation could be slimmed down to be vastly simpler - just the start and end dates (and reason for ending).

No. 2 in that list is by far the easiest on we the maintainers. No. 1 is by far the hardest on client parsers. No. 3 is by far the hardest to transition to for both maintainers and existing parsers, but the most precise and useful.

So we've been going with No. 2, and I've been fine with that. I would also be fine with No. 3, but I think we'd want to tackle it in full (not just making it so party breaks up the terms, but all relevant changes) rather than do a piece of it now but then realize that another field is also useful to hinge on later.

I'm with Josh on not making this drastic of a transition in the immediate future, but I do want to frame the issue for later thinking.

@GPHemsley
Copy link
Member Author

Not realizing you made that comment, I began the argument about when to split terms in #15.

@parkr
Copy link
Member

parkr commented May 28, 2013

Just perusing committee-membership-current.yaml, it appears party is "majority" or "minority", which is different that what the README would have you expect.

How are these YAML files updated? Are they updated by your scripts, or by hand?

@GPHemsley
Copy link
Member Author

The committee files use a different format than the legislators/executive files. See further down in the README for their format.

@parkr
Copy link
Member

parkr commented May 28, 2013

Ah, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants