# **Big Data: Challenges and Strategies in Data Generation, Storage, and Retrieval**

**Step 1: Set Up AWS Account**

* If you don't have an AWS account, sign up for one at AWS Console.

* Go to the AWS Management Console.

![Screenshot 2024-02-09 173508.png](<attachment:Screenshot 2024-02-09 173508.png>)

**Step 2: Open DynamoDB Console**

In the AWS Management Console, navigate to the DynamoDB service.

![Screenshot 2024-02-09 173221.png](<attachment:Screenshot 2024-02-09 173221.png>)


**Step 3: Create a Table**

* Click on "Create Table."

* Enter a table name.

* Define the primary key (partition key and optional sort key).

* Configure settings like provisioned or on-demand capacity.

* Click "Create."


![Screenshot 2024-02-09 135852.png](<attachment:Screenshot 2024-02-09 135852.png>)

![Screenshot 2024-02-09 135953.png](<attachment:Screenshot 2024-02-09 135953.png>)

**Step 4: Add Values to the Table**

* Go to the "Items" tab in your table.

* Click "Create item."

* Add attributes and values for your item.

* Click "Save."


![Screenshot 2024-02-09 142143.png](<attachment:Screenshot 2024-02-09 142143.png>)


**Step 5: Query the Table**

* Go to the "Items" tab.

* Click "Query" to filter items based on key conditions or scan the entire table.

* Review the results.


*Initial table*

![Screenshot 2024-02-09 142449.png](<attachment:Screenshot 2024-02-09 142449.png>)


*Query 'Select * from Order Where Price > $100.00'*

![Screenshot 2024-02-09 143722.png](<attachment:Screenshot 2024-02-09 143722.png>)


*Results from the NoSQL Database sample*

![Screenshot 2024-02-09 143730.png](<attachment:Screenshot 2024-02-09 143730.png>)

**Step 6: Export Results to CSV**

* Execute a query or scan to get the results.

* Select the items you want to export.

* Click "Actions" and choose "Export to CSV."

* Save the CSV file to your local machine.

DynamoDB provides SDKs for various programming languages (e.g., Python, Java). You can use these SDKs to interact with DynamoDB programmatically. this is a result of a typical query and results given in a sample of non relational databases and how AWS Dynamo DB can handle large sets of data.


In [59]:
import pandas as pd

results = pd.read_csv('C:\\Users\\yeiso\\Documents\\GitHub\\Special-Topics-in-Data-Analytics-CSIS-4260-002\\docs\\results.csv')


In [60]:
df = pd.DataFrame(results)

# Display DataFrame as CSV
display(df.to_csv(index=False))

'Order_ID,Date_Created,Category,City,Price,Product_Name,State,Sub_Category\r\n3,2019-06-12,"""Office Supplies, Supplies""",Los Angeles,14.62,Self-Adhesive Address Labels for Typewriters by Universal,California,Labels\r\n2,2019-11-08,Furniture,Henderson,731.94,"""Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back""",Kentucky,"""Chairs, sits"""\r\n4,2020-04-15,Office Supplies,Concord,45.9,Xerox 1967,North Carolina,Paper\r\n6,2019-12-09,Office Supplies,"""Fremont, Quebec""",61.38,"""Acco Six-Outlet Power Strip, 4\' Cord Length""",Nebraska,Appliances\r\n1,2019-11-08,Furniture,Henderson,189.99,Bush Somerset Collection Bookcase,Kentucky,"""Bookcases, miscelaneous"""\r\n5,2019-12-05,Office Supplies,"""Seattle, New york""",907.15,Fellowes PB200 Plastic Comb Binding Machine,Washington,Binderss\r\n'

In [58]:
print(results.head())


   Order_ID Date_Created                     Category               City  \
0         3   2019-06-12  "Office Supplies, Supplies"        Los Angeles   
1         2   2019-11-08                    Furniture          Henderson   
2         4   2020-04-15              Office Supplies            Concord   
3         6   2019-12-09              Office Supplies  "Fremont, Quebec"   
4         1   2019-11-08                    Furniture          Henderson   

    Price                                       Product_Name           State  \
0   14.62  Self-Adhesive Address Labels for Typewriters b...      California   
1  731.94  "Hon Deluxe Fabric Upholstered Stacking Chairs...        Kentucky   
2   45.90                                         Xerox 1967  North Carolina   
3   61.38      "Acco Six-Outlet Power Strip, 4' Cord Length"        Nebraska   
4  189.99                  Bush Somerset Collection Bookcase        Kentucky   

                Sub_Category  
0                     Labels  


In [48]:
print(results.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Order_ID      6 non-null      int64  
 1   Date_Created  6 non-null      object 
 2   Category      6 non-null      object 
 3   City          6 non-null      object 
 4   Price         6 non-null      float64
 5   Product_Name  6 non-null      object 
 6   State         6 non-null      object 
 7   Sub_Category  6 non-null      object 
dtypes: float64(1), int64(1), object(6)
memory usage: 516.0+ bytes
None


In [51]:
print(results.columns)


Index(['Order_ID', 'Date_Created', 'Category', 'City', 'Price', 'Product_Name',
       'State', 'Sub_Category'],
      dtype='object')


In [50]:
print(results.shape)


(6, 8)


In [52]:
print(results.dtypes)

Order_ID          int64
Date_Created     object
Category         object
City             object
Price           float64
Product_Name     object
State            object
Sub_Category     object
dtype: object
