# Week 11 In-Class Challenge

This week, we are doing an in-class exercise.  This will be worth 5 extra credit points for each team that creates a successful solution that follows the programming guidelines we've established this semester.  All the requirements for this programming challenge are described below.  If you complete them all successfully, you will receive 5 points.  If you do not, you will receive 0 points.

Work as a group.  You will all receive the same number of points.

## Requirements
1. Your code must be a function named `week11()` that takes no parameters
2. Your `week11()` function must read this CSV from the internet and use it as input: https://hds5210-data.s3.amazonaws.com/Section111ValidICD10-Jan2024.csv
  * This file has three columns: CODE, SHORT DESCRIPTION, LONG DESCRIPTION, and NF EXCL
  * The NF EXCL indicates that this code is excluded from a "no fault" list related to workers compensation insurance claims
3. Your `week11()` function must use Pandas functions to generate new columns and filter the dataframe using the following rules
   * Create a new column called "CODE TYPE" that contains only the first character of the CODE column. For example if CODE="A001" then CODE TYPE="A"
   * Create a new column called "CODE NUM" that contains only the numeric part of the CODE column and make it numeric. For example if CODE="A001" then CODE NUM=1
   * Some CODE NUM portions cannot be converted directly because the have an "X" in them.  Convert that "X" to a "." and then conver the CODE NUM to a numeric value.  For example if CODE="E1037X1" then CODE NUM=1037.1
   * Filter your results to only include those rows where NF EXCL="Y"
   * Sort your results in ascending order by CODE NUM and then by CODE TYPE
4. Use the "checker" in the last cell to confirm that your results are correct.  If the checker gives any errors, you will receive no credit.


## Submitting
In Canvas, you will find an assignment called Week 11 In Class Group Extra Credit.  It may be at this link: https://canvas.slu.edu/courses/42884/assignments/362720

Submit the URL pointing to the file in GitHub you want me to review for your group submission.  Your URL should look something like this: https://github.com/paulboal/hds5210-2023/blob/main/week11/week11_inclass.ipynb


## Scoring Rubric
If your code passes my checker included at the bottom of this page, each person on your team will earn 5 points.  If you code does not pass my checker, you will earn 0 points. This is "all or nothing" extra credit.

---

In [206]:
# I've provide you code here to start with.

import pandas as pd

def week11():
    """() -> pd.DataFrame

    This function will process the file named in step 2 of the instructions above
    using the rules in step 3 above.  It will return a dataframe that contains
    the filtered, sorted, and enhanced results.

    For my tests, I will validate the shape to start with.
    If I have more time, I can figure out how to write tests for the other requirements.

    >>> week11().shape
    (1090, 6)
    """
    hospitals = pd.read_csv('https://hds5210-data.s3.amazonaws.com/Section111ValidICD10-Jan2024.csv')
    #Checking columns
    #print(hospitals.columns)
    #print(hospitals['CODE'])
    #adding the new columns
    hospitals['CODE TYPE']  = hospitals["CODE"].apply(lambda x:x[:1] if x[:1].isdigit()== None else x[:1])
    #print(hospitals['CODE TYPE'])
    hospitals['CODE NUM']  = hospitals["CODE"].apply(lambda x:x[1:] if ("X") in x[1:] == x[1:].replace("X", ".") else x[1:]).astype(float)
    # print(hospitals['CODE NUM'])
    # print(hospitals.columns)
    #filtering the column 'NF EXCL
    hospitals = hospitals.loc[hospitals['NF EXCL'] == "Y"]
    #checking if the filtering occurs
    #return hospitals.head()
    sorted_code_num = hospitals.sort_values(by='CODE NUM', ascending=True)
    #checking the sorting values
    #return sorted_code_num.head()
    sorted_code_type = sorted_code_num.sort_values(by='CODE TYPE', ascending=True)
    #print(sorted_code_type.isnull().sum())
    new_data = sorted_code_type.dropna()
    #return sorted_code_type.head()
    # Do your work here
    # and return a final data frame

    # This is a dummy piece of code that just passes my one doctest.
    # Obviously, it won't pass the checker at the bottom.
    # You'll want to delete this before you try checking your answer.
    final_data = pd.DataFrame(new_data)
    return final_data


In [207]:
week11()

Unnamed: 0,CODE,SHORT DESCRIPTION,LONG DESCRIPTION,NF EXCL,CODE TYPE,CODE NUM
11147,A7982,Anaplasmosis [A. phagocytophilum],Anaplasmosis [A. phagocytophilum],Y,A,7982.0
2524,B373,Candidiasis of vulva and vagina,Candidiasis of vulva and vagina,Y,B,373.0
8163,B3731,Acute candidiasis of vulva and vagina,Acute candisiasis of vulva and vagina,Y,B,3731.0
8164,B3732,Chronic candidiasis of vulva and vagina,Chronic candidiasis of vulva and vagina,Y,B,3732.0
11764,C8441,"Prph T-cell lymphoma, NEC, nodes of head, face...","Peripheral T-cell lymphoma, not elsewhere clas...",Y,C,8441.0
...,...,...,...,...,...,...
12514,Y9383,"Activity, rough housing and horseplay","Activity, rough housing and horseplay",Y,Y,9383.0
12512,Y9382,"Activity, spectator at an event","Activity, spectator at an event",Y,Y,9382.0
12508,Y9381,"Activity, refereeing a sports activity","Activity, refereeing a sports activity",Y,Y,9381.0
4754,Y761,Therapeutic and rehab ob/gyn devices assoc w i...,Therapeutic (nonsurgical) and rehabilitative o...,Y,Y,761.0


---

## You can run your doctests this way

In [208]:
from doctest import run_docstring_examples
run_docstring_examples(week11, globs=globals(), verbose=True)

Finding tests in NoName
Trying:
    week11().shape
Expecting:
    (1090, 6)
**********************************************************************
File "__main__", line 15, in NoName
Failed example:
    week11().shape
Expected:
    (1090, 6)
Got:
    (1096, 6)


---

## Use this code to check your output!

If you get something other than `"You did it!!"` then you still have work to do on your solution.

The feedback provided should give you some hints as to what you haven't done correctly in filtering and organizing the data.

You can run this as many times as you want.  I'm not recording who is trying what and if you're getting the right answers or not.

In [3]:
import requests

r = requests.post('https://rln3ys6dciybh6cydvapszesna0oxcyn.lambda-url.us-east-1.on.aws/',
                  headers={"content-type": "application/json"},
                  data=week11().to_json(orient='records'))

print(r.status_code)
print(r.text)

200
"It looks like you don't have the right columns. I want this: ['CODE', 'SHORT DESCRIPTION', 'LONG DESCRIPTION', 'NF EXCL', 'CODE TYPE', 'CODE NUM']. You gave me this: ['0', '1', '2', '3', '4', '5']"
