<a href="https://colab.research.google.com/github/weedge/doraemon-nb/blob/main/my_colab_gpu_topk.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# task
Given 8.5 million big data files, each data is an integer id vector of up to 128 dimensions (called doc), and the id value range is 0-50000.
Given a integer id vector of up to 128 dimensions (called query), the data set can be spread for optimization

```shell
# Generate test data, has been sorted in ascending order, the default docs file counts one document per line,10 documents; 10 query files
make gen
```
Find the average score topk (k=100) of the number of data intersections in query and doc; Here we define the intersection fraction of item as:
query[i] == doc[j] (0<=i<query_size, 0<=j<doc_size) calculates an intersection, the average number of query and doc intersections /max(query_size,doc_size)

``` shell
./bin/query_doc_scoring <doc_file_name> <query_file_name> <output_filename>
```

# optimize
note: just optimize stand-alone, for dist m/r(fan-out/in) arch to schedule those instances.

0. gpu device RR balance by user request
1. concurrency(cpu thread pool) + parallel(cpu openMP + gpu warp threads): cpu(baseline) -> cpu thread concurrency -> cpu + gpu -> cpu thread concurrency/parallel + gpu stream concurrency/warp thread parallel => dist
2. find or filter: use hashmap/bitmap(bloom) on cpu/gpu global memory or gpu shared memory
3. topk sort: heap sort (partial_sort) on cpu -> bitonic/radix sort on gpu parallel topk,then reduce topk to cpu
4. search: need build index (list(IVF,skip),tree, graph), orderly struct/model
5. SIMD: for cpu arch instruction set (intel cpu sse,avx2,avx512 etc..)
6. IO stream pipeline: for r query/docs file, (batch per thread, multibyte_split parallel Accelerators) , w res file
7. resources pool

# reference
- https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
- https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html
- https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
- https://docs.nvidia.com/cuda/cuda-runtime-api/index.html
- https://docs.nvidia.com/cuda/thrust/index.html
- https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
- https://nvlabs.github.io/cub/index.html
- https://stotko.github.io/stdgpu/api/memory.html
-
- https://www.youtube.com/watch?v=cOBtkPsgkus
- **https://www.youtube.com/watch?v=Na9_2G6niMw**
-
- https://www.csd.uwo.ca/~mmorenom/HPC-Slides/Many_core_computing_with_CUDA.pdf
- [Exploring Performance Portability for Accelerators via High-level Parallel Patterns](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=4Ab_NBkAAAAJ&citation_for_view=4Ab_NBkAAAAJ:hqOjcs7Dif8C), [PPT](https://pdfs.semanticscholar.org/b34a/f7c4739d622379fa31a1e88155335061c1b1.pdf)

-
- https://zhuanlan.zhihu.com/p/52344300
-
- https://passlab.github.io/OpenMPProgrammingBook/cover.html
-

- https://developer.nvidia.com/blog/maximizing-performance-with-massively-parallel-hash-maps-on-gpus/

- https://github.com/rapidsai/raft/blob/branch-23.12/docs/source/vector_search_tutorial.md


## view paper
1. [Fast Segmented Sort on GPUs.](https://raw.github.com/weedge/learn/main/gpu/Fast%20Segmented%20Sort%20on%20GPUs.pdf)
2. [Efficient Top-K query processing on massively parallel hardware](https://raw.githubusercontent.com/weedge/learn/main/gpu/Efficient%20Top-K%20Query%20Processing%20on%20Massively%20Parallel%20Hardware.pdf)
3. [stdgpu: Efficient STL-like Data Structures on the GPU](https://www.researchgate.net/publication/335233070_stdgpu_Efficient_STL-like_Data_Structures_on_the_GPU)
4. [Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods](https://sc23.supercomputing.org/presentation/?id=pap294&sess=sess156)

## view code
1. https://github.com/rapidsai/cudf/pull/8702 , https://github.com/rapidsai/cudf/blob/branch-23.12/cpp/tests/io/text/multibyte_split_test.cpp
2. https://github.com/vtsynergy/bb_segsort (k/v), https://github.com/Funatiq/bb_segsort (k,k/v)
3. https://github.com/anilshanbhag/gpu-topk
4. https://github.com/heavyai/heavydb/blob/master/QueryEngine/TopKSort.cu
5. https://github.com/rapidsai/raft/blob/branch-23.12/cpp/include/raft/neighbors/detail/cagra/topk_for_cagra/topk_core.cuh
6. https://github.com/rapidsai/raft/blob/branch-23.12/cpp/include/raft/matrix/select_k.cuh , https://github.com/rapidsai/raft/blob/branch-23.12/cpp/test/matrix/select_k.cuh

## run baseline

In [2]:
!python --version

Python 3.10.12


In [None]:
!nvcc -h

In [1]:
!nvidia-smi

Tue Nov  7 08:16:38 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!nvidia-smi -q

In [2]:
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nsight-systems-2023.2.3_2023.2.3.1001-1_amd64.deb
!apt update
!apt install ./nsight-systems-2023.2.3_2023.2.3.1001-1_amd64.deb
!apt --fix-broken install


--2023-11-07 08:16:46--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nsight-systems-2023.2.3_2023.2.3.1001-1_amd64.deb
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.199.20.126
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.199.20.126|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 317705436 (303M) [application/x-deb]
Saving to: ‘nsight-systems-2023.2.3_2023.2.3.1001-1_amd64.deb’


2023-11-07 08:16:47 (260 MB/s) - ‘nsight-systems-2023.2.3_2023.2.3.1001-1_amd64.deb’ saved [317705436/317705436]

Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:5 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Get:6 https://ppa.l

In [3]:
!wget "https://bj.bcebos.com/v1/ai-studio-online/9805dd2d2e8e472693efac637628e16b9f9c5be0fe30438bb4a80de3b386781a?responseContentDisposition=attachment%3B%20filename%3DSTI2_1017.zip&authorization=bce-auth-v1%2F5cfe9a5e1454405eb2a975c43eace6ec%2F2023-10-18T12%3A42%3A27Z%2F-1%2F%2F6b5388dcd9013bc9b340bb1806476afa938ce0c65f2f595e1a75f529e90e4187" -O STI2_1017.zip

--2023-11-07 08:59:49--  https://bj.bcebos.com/v1/ai-studio-online/9805dd2d2e8e472693efac637628e16b9f9c5be0fe30438bb4a80de3b386781a?responseContentDisposition=attachment%3B%20filename%3DSTI2_1017.zip&authorization=bce-auth-v1%2F5cfe9a5e1454405eb2a975c43eace6ec%2F2023-10-18T12%3A42%3A27Z%2F-1%2F%2F6b5388dcd9013bc9b340bb1806476afa938ce0c65f2f595e1a75f529e90e4187
Resolving bj.bcebos.com (bj.bcebos.com)... 103.235.46.61, 2409:8c04:1001:1002:0:ff:b001:368a
Connecting to bj.bcebos.com (bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1005669898 (959M) [application/octet-stream]
Saving to: ‘STI2_1017.zip’


2023-11-07 09:01:19 (10.9 MB/s) - ‘STI2_1017.zip’ saved [1005669898/1005669898]



In [4]:
!rm -rf STI2 && unzip STI2_1017.zip && mv STI2\ 2 STI2

Archive:  STI2_1017.zip
   creating: STI2 2/
  inflating: __MACOSX/._STI2 2       
   creating: STI2 2/bin/
  inflating: __MACOSX/STI2 2/._bin   
   creating: STI2 2/translate/
  inflating: __MACOSX/STI2 2/._translate  
  inflating: STI2 2/run.sh           
  inflating: __MACOSX/STI2 2/._run.sh  
  inflating: STI2 2/build.sh         
  inflating: __MACOSX/STI2 2/._build.sh  
   creating: STI2 2/src/
  inflating: __MACOSX/STI2 2/._src   
  inflating: STI2 2/bin/query_doc_scoring  
  inflating: __MACOSX/STI2 2/bin/._query_doc_scoring  
   creating: STI2 2/translate/res/
  inflating: __MACOSX/STI2 2/translate/._res  
   creating: STI2 2/translate/querys/
  inflating: __MACOSX/STI2 2/translate/._querys  
  inflating: STI2 2/translate/docs.txt  
  inflating: __MACOSX/STI2 2/translate/._docs.txt  
  inflating: STI2 2/src/topk.h       
  inflating: __MACOSX/STI2 2/src/._topk.h  
  inflating: STI2 2/src/topk.cu      
  inflating: __MACOSX/STI2 2/src/._topk.cu  
  inflating: STI2 2/src/main.cpp

In [3]:
!sh STI2/build.sh

build success


In [None]:
!STI2/bin/query_doc_scoring STI2/translate/docs.txt STI2/translate/querys ./res_2.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [None]:
!nvcc STI2/src/main.cpp STI2/src/topk.cu -o STI2/bin/query_doc_scoring_gpu  \
	-ISTI2/src \
	-L/usr/local/cuda/lib64 -lcudart -lcuda \
	-std=c++11 \
	-O3 \
	-g


In [None]:
!STI2/bin/query_doc_scoring_gpu STI2/translate/docs.txt STI2/translate/querys ./res_3.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [None]:
!diff res_3.txt STI2/translate/res/result.txt

1c1
< 3175
---
> 2990


In [None]:
!nvprof --print-gpu-trace STI2/bin/query_doc_scoring_gpu STI2/translate/docs.txt STI2/translate/querys ./res.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [None]:
!ncu --set full --call-stack --nvtx -o report_gpu STI2/bin/query_doc_scoring_gpu STI2/translate/docs.txt STI2/translate/querys ./res.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [44]:
!nvcc STI2/src/main.cpp topk/topk_query_stream.cu -o STI2/bin/query_doc_scoring_gpu_stream  \
	-ISTI2/src \
	-L/usr/local/cuda/lib64 -lcudart -lcuda \
	-std=c++11 \
	-O3 \
	-g

In [45]:
!STI2/bin/query_doc_scoring_gpu_stream STI2/translate/docs.txt STI2/translate/querys ./res_gpu_stream.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [None]:
!diff ./res_gpu_stream.txt STI2/translate/res/result.txt

1c1
< 2850
---
> 2990


In [None]:
!nvprof --print-gpu-trace STI2/bin/query_doc_scoring_gpu_stream STI2/translate/docs.txt STI2/translate/querys ./res_gpu_stream.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [None]:
!sleep 864000

## run topk

In [64]:
!make -C topk/ BUILD_TYPE=Release

make: Entering directory '/content/topk'
mkdir -p bin
g++ ./main.cpp -o ./bin/query_doc_scoring_cpu  \
	-I./ \
	-std=c++11 -Wall -march=native -pthread \
	-O3 \
	-g 
[01m[K./main.cpp:[m[K In function ‘[01m[Kvoid doc_query_scoring_cpu(std::vector<std::vector<short unsigned int> >&, int, std::vector<std::vector<short unsigned int> >&, std::vector<short unsigned int>&, std::vector<std::vector<int> >&, std::vector<std::vector<float> >&)[m[K’:
  233 |         for (int id = 0; [01;35m[Kid < docs.size()[m[K; ++id) {
      |                          [01;35m[K~~~^~~~~~~~~~~~~[m[K
  244 |         for (int id = 0; [01;35m[Kid < docs.size()[m[K; id++) {
      |                          [01;35m[K~~~^~~~~~~~~~~~~[m[K
  247 |             for (int j = 0; [01;35m[Kj < doc.size()[m[K; j++) {
      |                             [01;35m[K~~^~~~~~~~~~~~[m[K
g++ ./main.cpp -o ./bin/query_doc_scoring_cpu_concurrency  \
	-I./ \
	-std=c++11 -Wall -march=native -pthread \
	-O3 \


In [47]:
!topk/bin/query_doc_scoring_cpu STI2/translate/docs.txt STI2/translate/querys ./cpu_res.txt

/bin/bash: line 1: topk/bin/query_doc_scoring_cpu: No such file or directory


In [48]:
!diff cpu_res.txt STI2/translate/res/result.txt

diff: cpu_res.txt: No such file or directory


In [49]:
!topk/bin/query_doc_scoring_cpu_concurency STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_res.txt

/bin/bash: line 1: topk/bin/query_doc_scoring_cpu_concurency: No such file or directory


In [50]:
!diff cpu_concurency_res.txt STI2/translate/res/result.txt

diff: cpu_concurency_res.txt: No such file or directory


In [19]:
!make -C topk/ build_cpu_gpu BUILD_TYPE=Release

make: Entering directory '/content/topk'
mkdir -p bin
nvcc ./main.cpp ./topk.cu -o ./bin/query_doc_scoring_cpu_gpu  \
	-I./ \
	-L/usr/local/cuda/lib64 -lcudart -lcuda \
	-std=c++11 -Xcompiler="-Wall -Wextra" -gencode arch=compute_70,code=sm_70 --expt-relaxed-constexpr \
	-O3 \
	-DGPU \
	-g
[01m[K./main.cpp:[m[K In function ‘[01m[Kvoid doc_query_scoring_cpu(std::vector<std::vector<short unsigned int> >&, int, std::vector<std::vector<short unsigned int> >&, std::vector<short unsigned int>&, std::vector<std::vector<int> >&, std::vector<std::vector<float> >&)[m[K’:
  171 |         for (int id = 0; [01;35m[Kid < docs.size()[m[K; ++id) {
      |                          [01;35m[K~~~^~~~~~~~~~~~~[m[K
  182 |         for (int id = 0; [01;35m[Kid < docs.size()[m[K; id++) {
      |                          [01;35m[K~~~^~~~~~~~~~~~~[m[K
  185 |             for (int j = 0; [01;35m[Kj < doc.size()[m[K; j++) {
      |                             [01;35m[K~~^~~~~~~~~~~~

In [18]:
!topk/bin/query_doc_scoring_cpu_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_gpu_res.txt

/bin/bash: line 1: topk/bin/query_doc_scoring_cpu_gpu: No such file or directory


In [91]:
!diff cpu_gpu_res.txt STI2/translate/res/result.txt

1c1
< 2701
---
> 2990


In [54]:
!nvprof --print-gpu-trace topk/bin/query_doc_scoring_cpu_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_gpu_res_1.txt



In [55]:
!nsys profile  -o report_cpu_gpu.nsys-rep topk/bin/query_doc_scoring_cpu_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_gpu_res_1.txt


Executable not found in current directory or standard search paths


In [56]:
!ncu --set full --call-stack --nvtx -o report_cpu_gpu topk/bin/query_doc_scoring_cpu_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_gpu_res_1.txt

==ERROR== 'topk/bin/query_doc_scoring_cpu_gpu' does not exist or is not an executable. Please make sure to specify the absolute path to 'topk/bin/query_doc_scoring_cpu_gpu' if the executable is not in the local directory.


In [57]:
!make -C topk/ build_cpu_concurency_gpu BUILD_TYPE=Release

make: Entering directory '/content/topk'
make: *** No rule to make target 'build_cpu_concurency_gpu'.  Stop.
make: Leaving directory '/content/topk'


In [58]:
!topk/bin/query_doc_scoring_cpu_concurency_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_gpu_res.txt

/bin/bash: line 1: topk/bin/query_doc_scoring_cpu_concurency_gpu: No such file or directory


In [59]:
!diff cpu_concurency_gpu_res.txt STI2/translate/res/result.txt

diff: cpu_concurency_gpu_res.txt: No such file or directory


In [60]:
!nvprof --print-gpu-trace topk/bin/query_doc_scoring_cpu_concurency_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_gpu_res.txt



In [61]:
!nsys profile  -o report_cpu_concurency_gpu.nsys-rep topk/bin/query_doc_scoring_cpu_concurency_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_gpu_res.txt


Executable not found in current directory or standard search paths


In [62]:
!ncu --set full --call-stack --nvtx -o report_cpu_concurency_gpu topk/bin/query_doc_scoring_cpu_concurency_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_gpu_res.txt

==ERROR== 'topk/bin/query_doc_scoring_cpu_concurency_gpu' does not exist or is not an executable. Please make sure to specify the absolute path to 'topk/bin/query_doc_scoring_cpu_concurency_gpu' if the executable is not in the local directory.


## insert sort topk

In [None]:
!nvcc sum.cu -o sum

In [None]:
!./sum

Init input source[N]
CPU time: 317.27
GPU time: 11.21
Result: Error
GPU_result: 119571172;
CPU_result: 450029111;


In [None]:
!nvcc topk.cu -o topk

In [None]:
!./topk

Init source data...........
Complete init source data.....
GPU Run **************
GPU Complete!!!
CPU RUN***************
CPU Complete!!!!!CPU top1: 2147483611; GPU top1: 2147483611;
CPU top2: 2147483578; GPU top2: 2147483578;
CPU top3: 2147483526; GPU top3: 2147483526;
CPU top4: 2147483514; GPU top4: 2147483514;
CPU top5: 2147483491; GPU top5: 2147483491;
CPU top6: 2147483482; GPU top6: 2147483482;
CPU top7: 2147483417; GPU top7: 2147483417;
CPU top8: 2147483385; GPU top8: 2147483385;
CPU top9: 2147483327; GPU top9: 2147483327;
CPU top10: 2147483297; GPU top10: 2147483297;
CPU top11: 2147483267; GPU top11: 2147483267;
CPU top12: 2147483227; GPU top12: 2147483227;
CPU top13: 2147483204; GPU top13: 2147483204;
CPU top14: 2147483188; GPU top14: 2147483188;
CPU top15: 2147483183; GPU top15: 2147483183;
CPU top16: 2147483170; GPU top16: 2147483170;
CPU top17: 2147483156; GPU top17: 2147483156;
CPU top18: 2147483141; GPU top18: 2147483141;
CPU top19: 2147483140; GPU top19: 2147483140;
CPU to

## sample test

In [125]:
!make -C topk build_gpu_examples

make: Entering directory '/content/topk'
mkdir -p bin
nvcc ./example_bitonic_sort_topk.cu -o ./bin/example_bitonic_sort_topk  \
	-L/usr/local/cuda/lib64 -lcudart -lcuda \
	-std=c++11 -Xcompiler="-Wall -Wextra" -gencode arch=compute_70,code=sm_70 --expt-relaxed-constexpr \
	-O0 \
	-g


make: Leaving directory '/content/topk'


In [1]:
!sleep 86400

^C


In [15]:
!cd topk && g++ readfile.cpp -o bin/readfile --std=c++11 -O3

In [16]:
!topk/bin/readfile STI2/translate/docs.txt

docs_size:7853051 doc_lens_size:7853051
read file cost 36537 ms 


In [4]:
!cd topk && make build_cpu_gpu_doc_stream BUILD_TYPE=Release

mkdir -p bin
nvcc ./main.cpp ./topk_doc_stream.cu -o ./bin/query_doc_scoring_cpu_gpu_doc_stream  \
	-I./ \
	-L/usr/local/cuda/lib64 -lcudart -lcuda \
	-std=c++11 -Xcompiler="-Wall -Wextra" -gencode arch=compute_70,code=sm_70 --expt-relaxed-constexpr \
	-O3 \
	-DGPU \
	-g
[01m[K./main.cpp:[m[K In function ‘[01m[Kvoid doc_query_scoring_cpu(std::vector<std::vector<short unsigned int> >&, int, std::vector<std::vector<short unsigned int> >&, std::vector<short unsigned int>&, std::vector<std::vector<int> >&, std::vector<std::vector<float> >&)[m[K’:
  171 |         for (int id = 0; [01;35m[Kid < docs.size()[m[K; ++id) {
      |                          [01;35m[K~~~^~~~~~~~~~~~~[m[K
  182 |         for (int id = 0; [01;35m[Kid < docs.size()[m[K; id++) {
      |                          [01;35m[K~~~^~~~~~~~~~~~~[m[K
  185 |             for (int j = 0; [01;35m[Kj < doc.size()[m[K; j++) {
      |                             [01;35m[K~~^~~~~~~~~~~~[m[K
  150 |      

In [5]:
!topk/bin/query_doc_scoring_cpu_gpu_doc_stream STI2/translate/docs.txt STI2/translate/querys ./res_gpu_doc_stream.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [108]:
!diff ./res_gpu_doc_stream.txt STI2/translate/res/result.txt

1,10c1,10
< 6391
< 2095355	2104387	2118767	2147776	2055923	2206974	2238603	2020057	2267746	2220291	2151279	2154228	1960823	2001825	2005314	2029722	2054064	2098358	2099815	2287637	2300031	2158918	2167292	2177721	2177855	1981772	2196213	2520566	1917947	2017068	2028496	2083981	2323778	2247926	2282000	2202772	2227479	2236897	2186351	2028110	2079765	2486485	2150430	2381725	1760981	1855263	1869531	1877846	1885219	1896166	1950185	2119578	2126999	2306247	2315934	1776753	1811788	1848411	1901590	2026760	2042826	2073036	2217999	2224835	1709996	1754639	1844734	1987029	2091827	2093706	2284846	2151301	2328000	2351661	1706744	2081563	2597490	1759061	2011063	2077727	2737847	2362721	1956654	2193430	2236145	1462218	1816978	1897084	1897296	1902180	1925148	1933929	1952948	1958058	1975060	1984389	1984524	1987018	2034453	2070402
< 3354548	3385811	3457483	3584604	3229654	2491259	3320427	3340108	2374643	3657624	3516117	2403142	3631235	3648390	3732526	2575906	2228660	1839330	3110584	3242875	3405025	3557359	316

In [19]:
!cd topk && nvcc ./stream.cu -o ./bin/stream && ./bin/stream

Number of device(s): 1
Device 0
    Name:                    Tesla T4
    Glocbal memory:          15101.8 MB
    Shared memory per block: 48 KB
    Warp size:               32
    Max thread per block:    1024
    Thread dimension limits: 1024 x 1024 x 64
    Max grid size:           2147483647 x 65535 x 65535
    Compute capability:      7.5
 
Generating 7680 x 4320 BRGA8888 image, data size: 132710400
 
Computing results using CPU.
 
    Whole process took 497.971ms.
 
Computing results using GPU, default stream.
 
    Move data to GPU.
        Data transfer took 12.0095ms.
        Performance is 11.0504GB/s.
    Convert 8-bit BGRA to 8-bit YUV.
        Processing of 8K image took 1.70637ms.
        Performance is 77.7736GB/s.
    Move data to CPU.
        Data transfer took 8.13226ms.
        Performance is 12.2393GB/s.
    Whole process took 21.8481ms.
    Compare CPU and GPU results ...
        Results are the same.
 
Computing results using GPU, using 16 streams.
 
    Creating 

# rapidsai - cudf
use chunk multibyte_split, strings split, gpu accelerate.

1. https://github.com/rapidsai/cudf/blob/branch-23.12/CONTRIBUTING.md#build-cudf-from-source

In [None]:
!pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu11

In [None]:
!ls /usr/local/lib/python3.10/dist-packages/cudf

In [None]:
!git clone https://github.com/rapidsai/cudf.git

In [37]:
!cd cudf && ./build.sh --help

./build.sh [clean] [libcudf] [cudf] [cudfjar] [dask_cudf] [benchmarks] [tests] [libcudf_kafka] [cudf_kafka] [custreamz] [-v] [-g] [-n] [-h] [--cmake-args=\"<args>\"]
   clean                         - remove all existing build artifacts and configuration (start
                                   over)
   libcudf                       - build the cudf C++ code only
   cudf                          - build the cudf Python package
   cudfjar                       - build cudf JAR with static libcudf using devtoolset toolchain
   dask_cudf                     - build the dask_cudf Python package
   benchmarks                    - build benchmarks
   tests                         - build tests
   libcudf_kafka                 - build the libcudf_kafka C++ code only
   cudf_kafka                    - build the cudf_kafka Python package
   custreamz                     - build the custreamz Python package
   -v                            - verbose build mode
   -g                            -

In [48]:
!tar -zcvf libcudf.tar.gz /include/cudf /lib/libcudf.so

tar: Removing leading `/' from member names
/include/cudf/
/include/cudf/aggregation.hpp
tar: Removing leading `/' from hard link targets
/include/cudf/detail/
/include/cudf/detail/transform.hpp
/include/cudf/detail/datetime_ops.cuh
/include/cudf/detail/gather.cuh
/include/cudf/detail/iterator.cuh
/include/cudf/detail/join.hpp
/include/cudf/detail/is_element_valid.hpp
/include/cudf/detail/gather.hpp
/include/cudf/detail/copy_if.cuh
/include/cudf/detail/tdigest/
/include/cudf/detail/tdigest/tdigest.hpp
/include/cudf/detail/copy_range.cuh
/include/cudf/detail/timezone.hpp
/include/cudf/detail/fill.hpp
/include/cudf/detail/sizes_to_offsets_iterator.cuh
/include/cudf/detail/structs/
/include/cudf/detail/structs/utilities.hpp
/include/cudf/detail/labeling/
/include/cudf/detail/labeling/label_segments.cuh
/include/cudf/detail/concatenate.hpp
/include/cudf/detail/get_value.cuh
/include/cudf/detail/label_bins.hpp
/include/cudf/detail/datetime.hpp
/include/cudf/detail/transpose.hpp
/include/cud

In [14]:
!cd cudf && ./build.sh libcudf

Building for the architecture of the GPU in the system...
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Auto detection of gpu-archs: 75
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Pe

In [47]:
!ls /lib/

apt					       libcudftest_default_stream.so  llvm-14
bfd-plugins				       libcudftestutil.a	      locale
binfmt.d				       libdfalt.a		      lsb
blt2.5					       libdfalt.la		      man-db
clang					       libdfalt.so		      mime
cmake					       libdfalt.so.0		      modprobe.d
compat-ld				       libdfalt.so.0.0.0	      modules
cpp					       libfmt.so		      modules-load.d
dbus-1.0				       libfmt.so.9		      ogdi
debug					       libfmt.so.9.1.0		      openssh
dh-elpa					       libgdal.a		      os-release
dpkg					       libgdal.so		      p7zip
emacsen-common				       libgdal.so.30		      pam.d
environment.d				       libgdal.so.30.0.3	      pkgconfig
file					       libgmock_main.so		      pkg-config.multiarch
gcc					       libgmock_main.so.1.13.0	      policykit-1
girepository-1.0			       libgmock.so		      polkit-1
git-core				       libgmock.so.1.13.0	      python2.7
gnupg					       libgtest_main.so		      python3
gnupg2					       libgtest_main.so.1.13.0	      python3.

In [16]:
!git clone https://github.com/gabime/spdlog.git

Cloning into 'spdlog'...
remote: Enumerating objects: 27412, done.[K
remote: Counting objects: 100% (3986/3986), done.[K
remote: Compressing objects: 100% (358/358), done.[K
remote: Total 27412 (delta 3768), reused 3671 (delta 3615), pack-reused 23426[K
Receiving objects: 100% (27412/27412), 40.87 MiB | 12.95 MiB/s, done.
Resolving deltas: 100% (18478/18478), done.


In [17]:
!cd spdlog && cmake -B build -S . && make -C build -j

-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build spdlog: 1.12.0
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Build type: Release
-- Generating example(s)
-- Generating install
-- Configuring done (0.4s)
-- Generating done (0.0s)
-- Build files have been written to: /content/spdlog/build
make: Entering directory '/content/spdlog/build'
make[1]: Entering directory '/content/spdlog/build'
make[2]: Entering directory '/content/spdlog/build'
make[2]: Leaving directory '/content/spdlog/build'
make[2]: Entering directory '/content/spdlog/build'
[ 10%] [32mBuilding CXX object CMakeFiles/spdlog.dir/src/stdout_sinks.cpp.o[0m
[ 20%] [32mBuilding CXX object CMakeFiles/spdlog.dir/src/color_sinks.cpp.o[

In [23]:
!cp -r ./spdlog/include/spdlog/fmt/bundled /include/spdlog/fmt/

In [24]:
!ls /include/spdlog/fmt/

bin_to_hex.h  bundled  chrono.h  compile.h  fmt.h  ostr.h  ranges.h  xchar.h


In [89]:
!cd topk && nvcc readfile.cpp -o readfile -O3 --std=c++17 -I./ -I/include -L/lib -lcudf -DGPU -DFMT_HEADER_ONLY

In [82]:
!topk/readfile STI2/translate/docs.txt chunk

file size: 3287460378
chunk size: 536870912
 fread size: 536870912
 fread size: 536870912
 fread size: 536870912
 fread size: 536870912
 fread size: 536870912
 fread size: 536870912
 fread size: 66236441
readcnt: 7
docs_size:0 doc_lens_size:0
read file cost 13985 ms 


todo: use stream pool

In [76]:
!topk/readfile STI2/translate/docs.txt line

docs_size:7853051 doc_lens_size:7853051
read file cost 37050 ms 


In [90]:
!topk/readfile STI2/translate/docs.txt buffer

readcnt: 7 fread size: 3287461913
docs_size:7853051 doc_lens_size:7853051
read file cost 49607 ms 


# rapidsai - RAFT

use select k -> sort -> top k. gpu accelerate

1. https://github.com/rapidsai/raft/blob/branch-23.12/docs/source/build.md

In [25]:
!apt install ninja-build

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  ninja-build
0 upgraded, 1 newly installed, 0 to remove and 42 not upgraded.
Need to get 111 kB of archives.
After this operation, 358 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 ninja-build amd64 1.10.1-1 [111 kB]
Fetched 111 kB in 0s (248 kB/s)
Selecting previously unselected package ninja-build.
(Reading database ... 122022 files and directories currently installed.)
Preparing to unpack .../ninja-build_1.10.1-1_amd64.deb ...
Unpacking ninja-build (1.10.1-1) ...
Setting up ninja-build (1.10.1-1) ...
Processing triggers for man-db (2.10.2-1) ...


In [26]:
!git clone https://github.com/rapidsai/raft.git

Cloning into 'raft'...
remote: Enumerating objects: 30733, done.[K
remote: Counting objects: 100% (440/440), done.[K
remote: Compressing objects: 100% (244/244), done.[K
remote: Total 30733 (delta 246), reused 338 (delta 186), pack-reused 30293[K
Receiving objects: 100% (30733/30733), 12.42 MiB | 9.42 MiB/s, done.
Resolving deltas: 100% (22168/22168), done.


In [7]:
!cd raft && ./build.sh --help

./build.sh [<target> ...] [<flag> ...] [--cmake-args="<args>"] [--cache-tool=<tool>] [--limit-tests=<targets>] [--limit-bench-prims=<targets>] [--limit-bench-ann=<targets>] [--build-metrics=<filename>]
 where <target> is:
   clean            - remove all existing build artifacts and configuration (start over)
   libraft          - build the raft C++ code only. Also builds the C-wrapper library
                      around the C++ code.
   pylibraft        - build the pylibraft Python package
   raft-dask        - build the raft-dask Python package. this also requires pylibraft.
   docs             - build the documentation
   tests            - build the tests
   bench-prims      - build micro-benchmarks for primitives
   bench-ann        - build end-to-end ann benchmarks
   template         - build the example RAFT application template

 and <flag> is:
   -v                          - verbose build mode
   -g                          - build for debug
   -n                          - 

In [None]:
!ls /lib

In [36]:
!cd raft && ./build.sh libraft --compile-lib

Building for the architecture of the GPU in the system...
-- Auto detection of gpu-archs: 75
-- CPM: Using local package Thrust@1.17.2.0
-- CPM: Using local package rmm@23.12.0
-- CPM: Adding package NvidiaCutlass@2.10.0 (v2.10.0)
-- CMake Version: 3.27.7
-- CUDART: /usr/local/cuda/lib64/libcudart.so
-- CUDA Driver: /usr/local/cuda/lib64/stubs/libcuda.so
-- NVRTC: /usr/local/cuda/lib64/libnvrtc.so
-- Default Install Location: /content/raft/cpp/build/install
-- CUDA Compilation Architectures: 53;60;61;70;72;75;80;86
-- Enable caching of reference results in conv unit tests
-- Enable rigorous conv problem sizes in conv unit tests
-- Using NVCC flags: -DCUTLASS_TEST_LEVEL=0;-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1;-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1;-DCUTLASS_DEBUG_TRACE_LEVEL=0;$<$<BOOL:1>:-Xcompiler=-Wconversion>;$<$<BOOL:1>:-Xcompiler=-fno-strict-aliasing>
-- CUTLASS Revision: 28eb0b35
-- Configuring cublas ...
-- cuBLAS Disabled.
-- Configuring cuBLAS ... done.
-- Configuri

In [43]:
!tar -zcvf libraft.tar.gz /content/raft/cpp/build/install

tar: Removing leading `/' from member names
/content/raft/cpp/build/install/
/content/raft/cpp/build/install/lib/
/content/raft/cpp/build/install/lib/cmake/
/content/raft/cpp/build/install/lib/cmake/cuco/
/content/raft/cpp/build/install/lib/cmake/cuco/cuco-dependencies.cmake
/content/raft/cpp/build/install/lib/cmake/cuco/cuco-config-version.cmake
/content/raft/cpp/build/install/lib/cmake/cuco/cuco-config.cmake
/content/raft/cpp/build/install/lib/cmake/cuco/cuco-targets.cmake
/content/raft/cpp/build/install/lib/cmake/NvidiaCutlassTargets.cmake
/content/raft/cpp/build/install/lib/cmake/raft/
/content/raft/cpp/build/install/lib/cmake/raft/raft-compiled-lib-targets.cmake
/content/raft/cpp/build/install/lib/cmake/raft/raft-distributed-dependencies.cmake
/content/raft/cpp/build/install/lib/cmake/raft/raft-dependencies.cmake
/content/raft/cpp/build/install/lib/cmake/raft/raft-config-version.cmake
/content/raft/cpp/build/install/lib/cmake/raft/raft-compiled-static-lib-targets-release.cmake
/co

In [49]:
!ls -hg

total 1.4G
drwxr-xr-x 12 root 4.0K Nov  7 09:17 cudf
-rw-r--r--  1 root  61M Nov  7 14:59 libcudf.tar.gz
-rw-r--r--  1 root 109M Nov  7 14:50 libraft.tar.gz
drwxr-xr-x  3 root 4.0K Nov  7 09:02 __MACOSX
-rw-r--r--  1 root 303M Jun 26 07:01 nsight-systems-2023.2.3_2023.2.3.1001-1_amd64.deb
drwxr-xr-x 13 root 4.0K Nov  7 12:50 raft
drwxr-xr-x  1 root 4.0K Nov  3 18:00 sample_data
drwxr-xr-x 13 root 4.0K Nov  7 12:32 spdlog
drwxr-xr-x  5 root 4.0K Oct 13 08:58 STI2
-rw-r--r--  1 root 960M Oct 18 12:42 STI2_1017.zip
