# Manipulating Data - Part 2

* We will go over some more important topics in data manipulations.
* We will primarily go over how to sort, merge, and transpose data.
* Even more important, we will introduce a very powerful tool called PROC SQL.

## The PROC

* Before we start talking about these things, we need to know more about the PROC. 
* The PROC step consists of a group of SAS statements that call and execute a procedure, usually with a SAS data set as input.
    * Use PROCs to analyze the data in a SAS data set, produce formatted reports or other results, or provide ways to manage SAS files.
    * You can modify PROCs with minimal effort to generate the output that you need.
    * PROCs can also perform functions, such as displaying information about a SAS data set.
* The output from a PROC step can provide univariate descriptive statistics, frequency tables, crosstabulation tables, tabular reports consisting of descriptive statistics, charts, plots, and so on.
    * Output can also be in the form of an updated data set.
* Because PROC usually operates on a data set, we usually need specify a data = statement. For example, 


In [None]:
PROC CONTENTS DATA = SASHELP.CARS;
RUN;

* This will yield an HTML page as output. These outputs are actually also tables. You can use ODS ouput to read these numbers. More later. 

## Sort Data

* To sort data, we need to use PROC SORT. Let's try to use SASHELP.CARS as an example. 

In [None]:
DATA CARS; /* I have to create a copy of this because I am not allowed to change the data in the SASHELP library */
    SET SASHELP.CARS;
RUN;

PROC SORT DATA = CARS;
    BY Make;
RUN;

PROC SORT DATA = CARS;
    BY Type;
RUN;

PROC SORT DATA = CARS;
    BY Make Type;
RUN;

* We can change add some options to achieve more functions in the sort. For example, if I want to keep a copy of the original data but put the sorted data in another file, we can use the OUT = option. 

In [None]:
PROC SORT DATA = SASHELP.CARS OUT = CARS_SORTED;
    BY Make Type;
RUN;

## Subsetting Data using ***DATA SET OPTION***

* We can subset the data using ***data set options***.  
    * ***(OBS = 10)*** is also a data set option. 
* Let's say we want to sort and output the DriveTrain = All and MSRP less than $30,000.

In [None]:
PROC SORT DATA = SASHELP.CARS (where = (DriveTrain = "All" and MSRP <= 30000)) OUT = CARS_SORTED;
    BY Make Type;
RUN;

* We can also subset columns using keep or drop. Let's say we want to only keep these four columns: Make Model MSRP Origin

In [None]:
PROC SORT DATA = SASHELP.CARS (where = (DriveTrain = "All" and MSRP <= 30000)) OUT = CARS_SORTED (keep = Make Model MSRP Origin);
    BY Make Type;
RUN;

* Why do I put where = and keep = behind different data sets? Let's put them behind one data and see what happens. 

In [None]:
PROC SORT DATA = SASHELP.CARS (keep = Make Model MSRP Origin where = (DriveTrain = "All" and MSRP <= 30000)) OUT = CARS_SORTED;
    BY Make Type;
RUN;

* Can you think about what the reported error means? 
* Why do you think the error is reported? 

* What if we try these?

In [None]:
PROC SORT DATA = SASHELP.CARS (where = (DriveTrain = "All" and MSRP <= 30000) keep = Make Model MSRP Origin) OUT = CARS_SORTED;
    BY Make Type;
RUN;

In [None]:
PROC SORT DATA = SASHELP.CARS (keep = Make Model MSRP Origin) OUT = CARS_SORTED;
    BY Make Type;
    where DriveTrain = "All" and MSRP <= 30000;
RUN;

* As you can see, ***DATA SET OPTIONS*** can be pretty versatile.
* It will be come very handy really soon. With data set options, we can make the code shorter and are less likely to make mistakes. 

* Sorting data can be slow, but it is crucial to a lot of operations in SAS. For example, let's print CARS_SORTED as an output. When you run the following code, you should be able to see the output window. 

In [None]:
PROC SORT DATA = SASHELP.CARS (where = (DriveTrain = "All" and MSRP <= 30000)) OUT = CARS_SORTED (keep = Make Model MSRP Origin);
    BY Make Type;
RUN;

PROC PRINT DATA = CARS_SORTED;
RUN;

* Now, I want to print the resulting data and group by each origin of the car. What can we do? 
* We can use the BY statement in PROC PRINT. BY statement tells the PROC to process things by group. 
* To use the BY statement, the data must be sorted by the group variable. Try the following. 

In [None]:
PROC SORT DATA = SASHELP.CARS (where = (DriveTrain = "All" and MSRP <= 30000)) OUT = CARS_SORTED (keep = Make Model MSRP Origin);
    BY Make Type;
RUN;

PROC PRINT DATA = CARS_SORTED;
    BY Origin; /* <--- The BY statement */
RUN;

* You should see an error saying the data is not sorted. How can we address this issue?
* So let's sort it properly and the rerun. Try to do it yourself before you look at the code. 

In [None]:
PROC SORT DATA = SASHELP.CARS (where = (DriveTrain = "All" and MSRP <= 30000)) OUT = CARS_SORTED (keep = Make Model MSRP Origin);
    BY Origin;
RUN;

PROC PRINT DATA = CARS_SORTED;
    BY Origin;
RUN;

## Creating Multiple Data Sets

* When using DATA steps, we can create multiple data sets at the same time. All we need to do is: 

In [None]:
data sub1 sub2;

* Because data steps will create data sets first, this will create two data sets. 

* Then we need to load rows to the two data sets. 

In [None]:
data sub1 sub2;
    set SASHELP.CARS;
run;

* If we run this, we shall get two itentical data sets. Both are the same as SASHELP.CARS;
* We can split SASHELP.CARS into sub1 and sub2 using some if conditions.
    * For example, how about we want sub1 to contain Asian cars and sub2 to contain all the others?
* To achieve this, we will need to use a data step statement called ***OUTPUT***.
    * By default, every DATA step contains an implicit OUTPUT statement at the end of each iteration that tells SAS to write observations to the data set or data sets that are being created.
    * Placing an explicit OUTPUT statement in a DATA step overrides the automatic output, and SAS adds an observation to a data set only when an explicit OUTPUT statement is executed.
    * Once you use an OUTPUT statement to write an observation to any one data set, however, there is no implicit OUTPUT statement at the end of the DATA step. In this situation, a DATA step writes an observation to a data set only when an explicit OUTPUT executes. You can use the OUTPUT statement alone or as part of an IF-THEN or SELECT statement or in DO-loop processing.
* Let's look at a couple of examples. 


In [None]:
data sub1 sub2; /* <-- Creating 2 data sets */
    set SASHELP.CARS; /* <-- Reading obs from SASHELP.CARS */
    if Origin = "Asia" then OUTPUT sub1;  /* <-- Output Asian cars to sub1 */
    else output sub2;  /* <-- Output others to sub2 */
run;

* We can also use OUTPUT statement to duplicate or triplicate rows. 

In [None]:
data CARs; /* <-- Creating 2 data sets */
    set SASHELP.CARS; /* <-- Reading obs from SASHELP.CARS */
    output;
    output; /* <-- Duplicating */
    output; /* <-- Triplicating */
run;

* Why is this happenning?
    * Recall that SAS DATA steps execute line-by-line and row-by-row.
    * In the code above we did the following in order
        1. **data CARs;** creates a data called CARS
        2. **set SASHELP.CARS;** loads the first observation from SASHELP.CARS
        3. **output;** outputs that observation to the new data called CARS
        4. **output;** outputs the same observation to the new data because we have not read the 2nd observation yet. 
        5. **output;** outputs the same observation once more to the new data because we have not read the 2nd observation yet.
        6. Go the 2nd observation and redo the 3 output statements.
        7. Repeat until the end of the data. 

## Merge Data

* There are two major ways to merge data - DATA step or PROC SQL.
    * You don't need anything else unless when you have to handle large data. 

### Merge Data using DATA step

#### Base Example

* Load the data

In [None]:
FILENAME test temp; /* <-- I put the data on github. This little block saves you from downloading the data and uploading the data to SAS ODA */
proc http url='https://github.com/xieyutongcn/Statistical_Programming/raw/e89edba9a803dc61f62893f8a6c7d8c8e6a553eb/03/Data_To_Merge.xlsx' method="GET" out=test;
run;

/* The code below imports each sheet */
proc import file=test dbms=xlsx out=USA_CARS replace;
getnames=yes;
sheet="USA_Cars";
run;

proc import file=test dbms=xlsx out=Gernam_CARS replace;
getnames=yes;
sheet="German_Cars";
run;

proc import file=test dbms=xlsx out=Japan_CARS replace;
getnames=yes;
sheet="Japan_Cars";
run;

FILENAME test temp;
proc http url='https://github.com/xieyutongcn/Statistical_Programming/raw/main/03/Cars.csv' method="GET" out=test;
run;

proc import file=test dbms=csv out=CARS replace;
getnames=yes;
run;


* Take a look at each data.
    * What are the identifying variables?
    * What information does each data set contains?
    * What are the overlapping columns? What are the unique columns? 

* Let's merge Cars with USA_Cars. 

In [None]:
DATA NEW;
    MERGE Cars USA_Cars;
    BY ID;
RUN;

<center><font size="+2">Anything wrong? What do we need to do before using a BY?</font></center>

<center><font size="+2">You are right! <b>SORT!</b></font></center>

In [None]:
PROC SORT DATA = Cars;
    BY ID;
RUN;

PROC SORT DATA = USA_Cars;
    BY ID;
RUN;

DATA NEW;
    MERGE Cars USA_Cars;
    BY ID;
RUN;

* There are a few possibilities for the target data set: 
    * Keep the observations that appear in both data (***inner join***)
    * Keep all the observations that appear in one of the data sets (***left or right join***)
    * Keep all observations that appear in both data sets (***full join***)
    * Keep all observations from one data set that do not also appear in the other (***exclusion***)
* What join was the data we just created? 

#### Data set option ***in =***

* How do we achieve these different joins? 
    * We just need to use a data set option **in =**

In [None]:
DATA NEW; /* <-- We don't have to sort again because the sort is done. If you do not pass through the data with another proc, the order with be kept. */
    MERGE Cars (in = In1) USA_Cars (in = In2);
    BY ID;
    In1_explicit = In1;
    In2_explicit = In2;
RUN;

* Let's see the data. 
    * The **in =** option creates two implict variables **In1** and **In2**.
        * These are Booleans, which means they are either TRUE of FALSE. 
    * We use **In1_explicit** and **In2_explicit** to create two columns to show **In1** and **In2**.
    * They mark whether an observation is found from the **Data1** and **Data2**.
* So to achieve inner, left, right, full or exclusion join, we need to use these two variables. 

#### Different Merges with Data Step and ***in =*** Option

In [None]:
/* Left Join */ 
DATA NEW;
    MERGE Data1 (in = In1) Data2 (in = In2);
    BY ID;
    if In1;
RUN;

In [None]:
/* Right Join */ 
DATA NEW;
    MERGE Data1 (in = In1) Data2 (in = In2);
    BY ID;
    if In2;
RUN;

In [None]:
/* Intersection (Inner Join) */ 
DATA NEW;
    MERGE Data1 (in = In1) Data2 (in = In2);
    BY ID;
    if In1 and In2;
RUN;

In [None]:
/* Exclusion */ 
DATA NEW;
    MERGE Data1 (in = In1) Data2 (in = In2);
    BY ID;
    if In1 and not In2;
RUN;

#### Let's Try This

* Create a data that gives me all the information of German cars. 

#### Calculate a Summary and Merge Back

In some scenarios, we need to add the average of a column to the data. 

For example, we want to select the data above or below the average. To do so, we need to calculate the average first and then compare each row with the average value. 

To calculate the average, we can use PROC MEANS and then merge the calculated mean data with the original data.

PROC MEANS is another important PROC. It can calculate statistics of numeric variables, such as average (mean), standard deviation, variance, minimum, maximum and so on. 

So let's spend some time on figuring out how it works. 

In [None]:
/* The simplest form of the syntax is like this. */
PROC MEANS DATA = SASHELP.CARS;
    VAR MSRP;
RUN;

By default, this will create an HTML output. However, we want a table so that we can merge back. Fortunately, PROC MEANS offers such an option. 

In [None]:
PROC MEANS DATA = SASHELP.CARS;
    VAR MSRP;
    OUTPUT OUT = AVERAGE MEAN = / AUTONAME; /* <-- This is saving all the calculated values in a data called AVERAGE. *//* AUTONAME is an option that specifies the name of the new generated variable. */
RUN;

/* or */

PROC MEANS DATA = SASHELP.CARS;
    OUTPUT OUT = AVERAGE MEAN(MSRP) = / AUTONAME;
RUN;

Both will create the same data and print the output. 

We don't really need it to print the output because we are trying to get the average. We can specify the **NOPRINT** option. 

In [None]:
PROC MEANS DATA = SASHELP.CARS NOPRINT;
    OUTPUT OUT = AVERAGE MEAN(MSRP) = / AUTONAME;
RUN;

Then merge back. 

In [None]:
DATA NEW;
    IF _N_ = 1 THEN SET AVERAGE;
    SET SASHELP.CARS;
RUN;

In this DATA step, SASHELP.CARS is the data set with more than one observation (the original data) and AVERAGE is the data set with a single observation (the average). SAS reads SASHELP.CARS in a normal SETvstatement, simply reading the observations in a straightforward way. SAS also reads AVERAGE with a SET statement but only in the first iteration of the DATA step—when the SAS automatic variable _N_ equals 1. (More on this later.) SAS then retains the values of variables from AVERAGE for all observations in new-data-set.

This works because variables that are read with a SET statement are automatically retained. Normally, you don't notice this because the retained values are overwritten by the next observation. But in this case the variables from summary-data-set are read once at the first iteration of the DATA step and then retained for all other observations. The effect is similar to a RETAIN statement (more on this later). This technique can be used any time you
want to combine a single observation with many observations, without a common variable.

### PROC SQL and Merge Data using PROC SQL

#### Brief Intro of PROC SQL

* PROC SQL copies the SQL language to SAS.
    * Consider PROC SQL as a SAS version of the SQL.
        * SQL is a power database management language.
* A typical PROC SQL looks like this. 

In [None]:
PROC SQL;
    create table new_table as 
    select distinct a.*, b.*
    from old_table1 as a, old_table as b
    where a.id = b.id;
QUIT;

* Note that even though this is a long statement, I only have one **;** betwee **PROC SQL;** and **QUIT;**.
    * What's between is very similar to the SQL language. 
* Also note that this statement ends with **QUIT;** not **RUN;**

We can use PROC SQL to subset data, modify data, create new variables or even calculate complex statistical values. The following code will 
1. create a data set called **CARS_SORTED** from SASHELP.CARS,
2. sort data by Origin
3. Keep only Make Model MSRP Origin
4. Keep only DriveTrain = All and MSRP no more than 30000
5. Calculate the average MSRP for each manufacturer (Make)

In [None]:
PROC SQL;
    create table CARS_SORTED as 
    select distinct Make, Model, MSRP, Origin, mean(MSRP) as Average_MSRP
    from SASHELP.CARS
    where DriveTrain = "All" and MSRP <= 30000
    group by Make
    order by Origin;
QUIT;

As you can see, SQL is very powerful. You can even put multiple SQL commands in one PROC SQL statement. The statements will execute in order. 

In [None]:
PROC SQL;

    create table CARS_SORTED as 
    select distinct Make, Model, MSRP, Origin, mean(MSRP) as Average_MSRP
    from SASHELP.CARS
    where DriveTrain = "All" and MSRP <= 30000
    group by Make
    order by Origin;

    create table CARS_SORTED2 as 
    select distinct Make, Model, MSRP, Origin, mean(MSRP) as Average_MSRP
    from SASHELP.CARS
    where DriveTrain = "All" and MSRP > 30000
    group by Make
    order by Origin;

QUIT;

#### Inner Join

In [None]:
PROC SQL;
Create table dummy as
Select * from A Inner Join B
on a.ID = b.id;
Quit;

PROC SQL;
Create table dummy as
Select * from A Join B
on a.ID = b.id;
Quit;

PROC SQL;
Create table dummy as
Select * from A, B
where a.ID = b.id;
Quit;

#### Left Join

In [None]:
PROC SQL;
Create table COMBINED as
Select * from A left Join B
on a.ID = b.id;
Quit;

#### Right Join

In [None]:
PROC SQL;
Create table COMBINED as
Select * from A right Join B
on a.ID = b.id;
Quit;

#### Full Join

In [None]:
PROC SQL;
Create table COMBINED as
Select * from A full Join B
on a.ID = b.id;
Quit;

#### Cross Join

In [None]:
PROC SQL;
Create table COMBINED as
Select * from A cross Join B;
Quit;

#### Let's Try This
* Create a data that gives me all the information of German cars.
* Create a column including the average MSRP.
* Filter the new data with rows lower than average MSRP. 

### Data Merge Cheat Sheet

More can be done. Use this image as a reference. 

<img src="SQLJoins.png">

This compares data step merge and SQL merge

<img src="Merging1.jpg">

<img src="Merging2.jpg">

## Reshaping Data

### The Basics

* To reshape data, we need PROC TRANSPOSE.
    * We can also use some data steps tricks to do it, but they will come later. 
* What do I mean by reshape?
    * This example reshapes from ***long*** to ***wide***. 

<img src="TransposeIllustration.jpg">

* Let's see the following simple case to learn the syntax. 

In [None]:
DATA before; /* <-- Loading the data */
INFILE DATALINES MISSOVER DSD DLM = ",";
INPUT CityID $ City $ Category $ Count ;
datalines;
A, New York, Morning, 1006
A, New York, Afternoon, 1720
A, New York, Evening, 4402
B, Chicago, Morning, 4019
B, Chicago, Afternoon, 2657
B, Chicago, Evening, 2889
C, Los Angeles, Morning, 2981
C, Los Angeles, Afternoon, 2814
C, Los Angeles, Evening, 1015
;
RUN;

PROC TRANSPOSE data = before out = after;
run;

PROC TRANSPOSE data = before out = after;
by CityID;
id category;
var count;
run;

PROC PRINT DATA = after;
run;

PROC TRANSPOSE data = before out = after (drop = _name_); /* _NAME_ saves the name of the variable that's transposed */
by CityID City;
id category;
var count;
run;

PROC PRINT DATA = after;
run;

* To understand the sytax and the function of each part:
    * ***data =***: the data we are transposing
    * ***out =***: the data we want
    * ***by***: doing the transposing within each by group; the BY variables themselves aren't transposed; this will become the group id in the new data
    * ***id***: the new column name
    * ***var***: the value we are transposing

* Let's transpose it back to long

In [None]:
PROC TRANSPOSE data = after out = before_after;
run;

PROC PRINT DATA = after;
run;

In [None]:
PROC TRANSPOSE data = after out = before_after;
var morning afternoon evening;
by CityID City;
run;

PROC PRINT DATA = after;
run;

In [None]:
PROC TRANSPOSE data = after out = before_after (rename = (col1 = Count _name_ = Category));
var morning afternoon evening;
by CityID City;
run;

PROC PRINT DATA = after;
run;

In [None]:
/* This is the complete code */
/* This should give use two identifcal data sets - before and before_after */
DATA before; /* <-- Loading the data */
length City $10. Category $10.;
INFILE DATALINES MISSOVER DSD DLM = ",";
INPUT CityID $ City $ Category $ Count ;
datalines;
A, New York, Morning, 1006
A, New York, Afternoon, 1720
A, New York, Evening, 4402
B, Chicago, Morning, 4019
B, Chicago, Afternoon, 2657
B, Chicago, Evening, 2889
C, Los Angeles, Morning, 2981
C, Los Angeles, Afternoon, 2814
C, Los Angeles, Evening, 1015
;
RUN;

PROC TRANSPOSE data = before out = after (drop = _name_);
by CityID City;
id category;
var count;
run;

PROC TRANSPOSE data = after out = before_after (rename = (col1 = Count _name_ = Category));
var morning afternoon evening;
by CityID City;
run;

PROC PRINT DATA = before;
run;

PROC PRINT DATA = before_after;
run;

* Try it your self. Try to reshape the long to a wide and then back to long.
* Remember, you want to first know what you **have** and what you **want**. 

In [None]:
/* The following example is a more realistic example that uses a data file having 300 records in long format (50 wide records and six time points). */
data long; 
  input id year inc ; 
cards; 
 1 90 66483 
 1 91 69146 
 1 92 74643 
 1 93 79783 
 1 94 81710 
 1 95 86143 
 2 90 17510 
 2 91 17947 
 2 92 19484 
 2 93 20979 
 2 94 21268 
 2 95 22998 
 3 90 57947 
 3 91 62964 
 3 92 68717 
 3 93 70957 
 3 94 75198 
 3 95 75722 
 4 90 64831 
 4 91 71060 
 4 92 71918 
 4 93 72514 
 4 94 73100 
 4 95 74379 
 5 90 18904 
 5 91 19949 
 5 92 21335 
 5 93 22237 
 5 94 23829 
 5 95 23913 
 6 90 32057 
 6 91 34770 
 6 92 35834 
 6 93 37387 
 6 94 40899 
 6 95 42372 
 7 90 60551 
 7 91 64869 
 7 92 67983 
 7 93 70498 
 7 94 71253 
 7 95 75177 
 8 90 16553 
 8 91 18189 
 8 92 18349 
 8 93 19815 
 8 94 21739 
 8 95 22980 
 9 90 32611 
 9 91 33465 
 9 92 35961 
 9 93 36416 
 9 94 37183 
 9 95 40627 
10 90 61379 
10 91 66002 
10 92 67936 
10 93 70513 
10 94 74405 
10 95 76009 
11 90 24065 
11 91 24229 
11 92 25709 
11 93 26121 
11 94 26617 
11 95 28142 
12 90 32975 
12 91 36185 
12 92 37601 
12 93 41336 
12 94 43399 
12 95 43670 
13 90 69548 
13 91 71341 
13 92 72455 
13 93 76552 
13 94 80538 
13 95 85330 
14 90 50274 
14 91 53349 
14 92 55900 
14 93 59375 
14 94 61216 
14 95 63911 
15 90 72011 
15 91 73334 
15 92 76248 
15 93 77724 
15 94 78638 
15 95 80582 
16 90 18911 
16 91 20046 
16 92 21343 
16 93 21630 
16 94 22330 
16 95 23081 
17 90 68841 
17 91 75410 
17 92 80806 
17 93 81327 
17 94 81571 
17 95 86499 
18 90 28099 
18 91 30716 
18 92 32986 
18 93 36097 
18 94 39124 
18 95 39866 
19 90 17302 
19 91 18778 
19 92 18872 
19 93 19884 
19 94 20665 
19 95 21855 
20 90 16291 
20 91 16674 
20 92 16770 
20 93 17182 
20 94 17979 
20 95 18917 
21 90 43244 
21 91 46545 
21 92 47633 
21 93 50744 
21 94 54734 
21 95 59075 
22 90 56393 
22 91 59120 
22 92 60801 
22 93 61404 
22 94 63111 
22 95 69278 
23 90 47347 
23 91 49571 
23 92 50101 
23 93 51345 
23 94 56463 
23 95 56927 
24 90 16076 
24 91 17217 
24 92 17296 
24 93 17900 
24 94 18171 
24 95 18366 
25 90 65906 
25 91 69679 
25 92 76131 
25 93 77676 
25 94 81980 
25 95 85426 
26 90 58586 
26 91 61188 
26 92 66542 
26 93 69267 
26 94 71063 
26 95 74549 
27 90 61674 
27 91 66584 
27 92 69185 
27 93 75193 
27 94 78647 
27 95 81898 
28 90 31673 
28 91 31883 
28 92 32774 
28 93 34485 
28 94 36929 
28 95 39751 
29 90 63412 
29 91 67593 
29 92 69911 
29 93 73092 
29 94 80105 
29 95 81840 
30 90 27684 
30 91 28439 
30 92 30861 
30 93 31406 
30 94 32960 
30 95 35530 
31 90 71873 
31 91 76449 
31 92 80848 
31 93 88691 
31 94 94149 
31 95 97431 
32 90 62177 
32 91 63812 
32 92 64235 
32 93 65703 
32 94 69985 
32 95 71136 
33 90 37684 
33 91 38258 
33 92 39208 
33 93 39489 
33 94 39745 
33 95 41236 
34 90 64013 
34 91 66398 
34 92 71877 
34 93 75610 
34 94 76395 
34 95 79644 
35 90 16011 
35 91 16847 
35 92 17746 
35 93 19123 
35 94 19183 
35 95 19996 
36 90 49215 
36 91 52195 
36 92 52343 
36 93 56365 
36 94 58752 
36 95 59354 
37 90 15774 
37 91 16643 
37 92 17605 
37 93 18781 
37 94 18996 
37 95 19685 
38 90 29106 
38 91 31693 
38 92 31852 
38 93 34505 
38 94 35806 
38 95 36179 
39 90 25147 
39 91 26923 
39 92 28785 
39 93 30987 
39 94 34036 
39 95 34106 
40 90 71978 
40 91 79144 
40 92 80453 
40 93 86580 
40 94 95164 
40 95 96155 
41 90 46166 
41 91 47579 
41 92 49455 
41 93 53849 
41 94 56630 
41 95 57473 
42 90 55810 
42 91 59443 
42 92 65291 
42 93 66065 
42 94 69009 
42 95 74365 
43 90 49642 
43 91 50603 
43 92 53917 
43 93 54858 
43 94 58470 
43 95 59767 
44 90 21348 
44 91 22361 
44 92 23412 
44 93 24038 
44 94 24774 
44 95 25828 
45 90 44361 
45 91 48720 
45 92 51356 
45 93 54927 
45 94 56670 
45 95 58800 
46 90 56509 
46 91 60517 
46 92 61532 
46 93 65077 
46 94 69594 
46 95 73089 
47 90 39097 
47 91 40293 
47 92 43237 
47 93 44809 
47 94 48782 
47 95 53091 
48 90 18685 
48 91 19405 
48 92 20165 
48 93 20316 
48 94 22197 
48 95 23557 
49 90 73103 
49 91 76243 
49 92 76778 
49 93 82734 
49 94 86279 
49 95 86784 
50 90 48129 
50 91 49267 
50 92 53799 
50 93 58768 
50 94 63011 
50 95 66461 
; 
run; 

In [None]:
/* Solution is here */
proc transpose data=long out=wide prefix=inc;
    by id;
    id year;
    var inc;
run;

proc print data=wide;
run;

### Transposing Two Variables

* Just simply PROC TRANSPOSE one-by-one

In [None]:
data long; 
  input famid year faminc spend ; 
cards; 
1 96 40000 38000 
1 97 40500 39000 
1 98 41000 40000 
2 96 45000 42000 
2 97 45400 43000 
2 98 45800 44000 
3 96 75000 70000 
3 97 76000 71000 
3 98 77000 72000 
; 
run ;

proc transpose data=long out=widef prefix=faminc;
   by famid;
   id year;
   var faminc;
run;

proc transpose data=long out=wides prefix=spend;
   by famid;
   id year;
   var spend;
run;

data wide;
    merge widef(drop=_name_) wides(drop=_name_);
    by famid;
run;

proc print data=wide;
run;

### Numeric and Character Variables

In [None]:
data long5; 
  length debt $ 3; 
  input famid year faminc spend debt $ ; 
cards; 
1 96 40000 38000 yes 
1 97 40500 39000 yes 
1 98 41000 40000 no 
2 96 45000 42000 yes 
2 97 45400 43000 no 
2 98 45800 44000 no 
3 96 75000 70000 no 
3 97 76000 71000 no 
3 98 77000 72000 no 
; 
run; 

proc transpose data=long5 out=widef prefix=faminc;
  by famid;
  id year;
  var faminc;
run;

proc transpose data=long5 out=wides prefix=spend;
  by famid;
  id year;
  var spend;
run;

proc transpose data=long5 out=wided prefix=debt;
  by famid;
  id year;
  var debt;
run;

data wide5 ;
  merge widef (drop=_name_) wides (drop =_name_) wided (drop=_name_);
  by famid ;
run;

proc print data=wide5;
run;