# Top and Bottom Performing
Let's look at how we might get the top performing stocks for a single period. For this example, we'll look at just a single month of closing prices:

In [12]:
!pip install Quandl

Collecting Quandl
  Downloading https://files.pythonhosted.org/packages/07/ab/8cd479fba8a9b197a43a0d55dd534b066fb8e5a0a04b5c0384cbc5d663aa/Quandl-3.5.0-py2.py3-none-any.whl
Collecting inflection>=0.3.1 (from Quandl)
  Downloading https://files.pythonhosted.org/packages/d5/35/a6eb45b4e2356fe688b21570864d4aa0d0a880ce387defe9c589112077f8/inflection-0.3.1.tar.gz
Building wheels for collected packages: inflection
  Running setup.py bdist_wheel for inflection ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/9f/5a/d3/6fc3bf6516d2a3eb7e18f9f28b472110b59325f3f258fe9211
Successfully built inflection
Installing collected packages: inflection, Quandl
Successfully installed Quandl-3.5.0 inflection-0.3.1


In [16]:
import quandl
quandl.ApiConfig.api_key = 'xJq8FoisxxJrjCcLsE4P'
quandl.ApiConfig.api_version = '2015-04-09'
quandl.ApiConfig.verify_ssl = False

k = quandl.get('HKEX/01810')
k['Previous Close']



Date
2018-07-09      NaN
2018-07-10    16.80
2018-07-11    19.00
2018-07-12    19.00
2018-07-13    19.26
2018-07-16    21.45
2018-07-17    21.05
2018-07-18    20.90
2018-07-19    21.55
2018-07-20    20.10
2018-07-23    19.88
2018-07-24    19.02
2018-07-25    18.24
2018-07-26    18.58
2018-07-27    18.30
2018-07-30    19.04
2018-07-31    18.68
2018-08-01    17.60
2018-08-02    17.86
2018-08-03    17.26
2018-08-06    17.00
2018-08-07    17.22
2018-08-08    17.42
2018-08-09    17.14
2018-08-10    18.08
2018-08-13    18.16
2018-08-14    17.44
2018-08-15    17.16
2018-08-16    16.30
2018-08-17    16.24
              ...  
2019-12-06     9.07
2019-12-09     9.34
2019-12-10     9.32
2019-12-11     9.21
2019-12-12     9.99
2019-12-13     9.93
2019-12-16    10.40
2019-12-17    10.52
2019-12-18    10.68
2019-12-19    10.50
2019-12-20    10.36
2019-12-23    10.34
2019-12-24    10.24
2019-12-27    10.56
2019-12-30    10.80
2019-12-31    10.64
2020-01-02    10.78
2020-01-03    11.22
2020-01-06    1

In [4]:
import pandas as pd

month = pd.to_datetime('02/01/2018')
close_month = pd.DataFrame(
    {
        'A': 1,
        'B': 12,
        'C': 35,
        'D': 3,
        'E': 79,
        'F': 2,
        'G': 15,
        'H': 59},
    [month])

close_month

Unnamed: 0,A,B,C,D,E,F,G,H
2018-02-01,1,12,35,3,79,2,15,59


In [5]:
xtime = pd.to_datetime('09/09/2019')
port_a =  pd.DataFrame({'APPL':999.9,'MSFT':1111},[xtime])

port_a

Unnamed: 0,APPL,MSFT
2019-09-09,999.9,1111


`close_month` gives use the prices for the month of February, 2018 for all the stocks in this universe (A, B, C, ...). Looking at these prices, we can see that the top 2 performing stocks for that month was E and H with the prices 79 and 59.

To get this using code, we can use the [`Series.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nlargest.html) function. This function returns the items with the *n* largest numbers. For the example we just talked about, our *n* is 2.

In [6]:
try:
    # Attempt to run nlargest
    close_month.nlargest(2)
except TypeError as err:
    print('Error: {}'.format(err))

Error: nlargest() missing 1 required positional argument: 'columns'


What happeened here? It turns out we're not calling the [`Series.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nlargest.html) function, we're actually calling [`DataFrame.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.nlargest.html), since `close_month` is a DataFrame. Let's get the Series from the dataframe using `.loc[month]`, where `month` is the 2018-02-01 index created above.

In [10]:
close_month.loc[month].nlargest(1)

E    79
Name: 2018-02-01 00:00:00, dtype: int64

Perfect! That gives us the top performing tickers for that month. Now, how do we get the bottom performing tickers? There's two ways to do this. You can use Panda's [`Series.nsmallest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nsmallest.html) function or just flip the sign on the prices and then apply [`DataFrame.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.nlargest.html). Either way is fine. For this course, we'll flip the sign with nlargest. This allows us to reuse any funtion created with nlargest to get the smallest.

To get the bottom 2 performing tickers from `close_month`, we'll flip the sign.

In [12]:
(-1 * close_month).loc[month].nlargest(2)

A   -1
F   -2
Name: 2018-02-01 00:00:00, dtype: int64

That gives us the bottom performing tickers, but not the actual prices. To get this, we can flip the sign from the output of nlargest.

In [13]:
(-1 * close_month).loc[month].nlargest(2) *-1

A    1
F    2
Name: 2018-02-01 00:00:00, dtype: int64

Now you've seen how to get the top and bottom performing prices in a single month. Let's see if you can apply this knowledge.
## Quiz
Implement `date_top_industries` to find the top performing closing prices and return their sectors for a single date. The function should only return the [set](https://docs.python.org/3/tutorial/datastructures.html#sets) of sectors, there shouldn't be any duplicates returned.

- The number of top performing prices to look at is represented by the parameter `top_n`.
- The `date` parameter is the date to look for the top performing prices in the `prices` DataFrame.
- The sector information for each ticker is located in the `sector` parameter.

For example:
```
                 Prices
               A         B         C         D         E
2013-07-08     2         2         7         2         6
2013-07-09     5         3         6         7         5
...            ...       ...       ...

           Sector
A       "Utilities"       
B       "Health Care"       
C       "Real Estate"
D       "Real Estate"
E       "Information Technology"

Date:  2013-07-09
Top N: 3
```
The set created from the function `date_top_industries` should be the following:
```
{"Utilities", "Real Estate"}
```
*Note: Stock A and E have the same price for the date, but only A's sector got returned. We'll keep it simple and only take the first occurrences of ties.*

In [None]:
import project_tests


def date_top_industries(prices, sector, date, top_n):
    """
    Get the set of the top industries for the date
    
    Parameters
    ----------
    prices : DataFrame
        Prices for each ticker and date
    sector : Series
        Sector name for each ticker
    date : Date
        Date to get the top performers
    top_n : int
        Number of top performers to get
    
    Returns
    -------
    top_industries : set
        Top industries for the date
    """
    print(prices)
    print("\n\n\n\n")
    print(sector[0])
    #print("______________________")
    #print(date)
    #print("______________________")
    #print(top_n)
    #print(prices.loc[date].nlargest(top_n))
    #print(sector)
    # TODO: Implement Function
    #w = 
    print(set(sector.loc[prices.loc[date].nlargest(top_n).index]))
    return set(sector.loc[prices.loc[date].nlargest(top_n).index])


project_tests.test_date_top_industries(date_top_industries)

## Quiz Solution
If you're having trouble, you can check out the quiz solution [here](top_and_bottom_performing_solution.ipynb).