A system which would be able to handle long-running processes in a distributed fashion
We need to be able to import products from a CSV file and into a database. There are half a million products to be imported into the database. You can find the CSV file here in a compressed format Large File processing - Assignment. Sample rows
| Name | Sku | Description |
|---|---|---|
| Bryce Jones | lay-raise-best-end | Art community floor adult your single type |
| John Robinson | cup-return-guess | Produce successful hot tree past action young |
After importing the data, we would like to run an aggregate query to give us no. of products with the same name.
- The code should follow concept of OOPS
- Support for regular non-blocking parallel ingestion of the given file into a table. Consider thinking about the scale of what should happen if the file is to be processed in 2 mins.
- Support for updating existing products in the table based on
skuas the primary key. - All 500k rows to be inserted into a single table
- An aggregated table on above rows with
nameandno. of productsas the columns
- You can choose programming language and framework of your choice
- You can choose a database of your preference
- You can use any design pattern you prefer to solve the above problems
Look inside the readme updated inside largeFileProcessor folder
Inside the folder LargeFileProcessor's readme
Look inside readme
Phase 1 is delievered!!