# Searh article in Medium

## 0. Overview

We'll search for text in the Medium dataset, and it will find the most similar results to the search text across all titles. Searching for articles is different from traditional keyword searches, which search for semantically relevant content. If you search for "**funny python demo**" it will return "**Python Coding for Kids - Setting Up For the Adventure**", not "**No key words about funny python demo**".

We will use Milvus and Towhee to help searches. Towhee is used to extract the semantics of the text and return the text embedding. The Milvus vector database can store and search vectors, and return related articles. So we first need to install [Milvus](https://github.com/milvus-io/milvus) and [Towhee](https://github.com/towhee-io/towhee).

Before getting started, please make sure that you have started a [Milvus service](https://milvus.io/docs/install_standalone-docker.md). This notebook uses [milvus 2.2.10](https://milvus.io/docs/v2.2.x/install_standalone-docker.md) and [pymilvus 2.2.11](https://milvus.io/docs/release_notes.md#2210).

In [16]:
#! pip install --upgrade pip
#! pip3 install -q towhee pymilvus==2.2.11
#! pip3 uninstall pymilvus -y

! pip3 install -q towhee pymilvus==2.1.1
! pip3 show pymilvus | grep -Ei 'Name:|Version:'
! pip3 show towhee | grep -Ei 'Name:|Version:'

Name: pymilvus
Version: 2.1.1
Name: towhee
Version: 1.1.3


## 1. Data preprocessing

The data is from the [Cleaned Medium Articles Dataset](https://www.kaggle.com/datasets/shiyu22chen/cleaned-medium-articles-dataset)(you can download it from Kaggle), which cleared the empty article titles in the data and conver the string title to the embeeding with Towhee [text_embedding.dpr operator](https://towhee.io/text-embedding/dpr), as you can see the `title_vector` is the embedding vectors of the title.

In [17]:
# Download data
! wget -q https://github.com/towhee-io/examples/releases/download/data/New_Medium_Data.csv

zsh:1: command not found: wget


## 1.1 Adding embeddings for columns

The dataset is from the [Kartverket dataset metadata](https://cdn.discordapp.com/attachments/1204433663035449384/1206537816654356480/metadata_no_format.csv?ex=65dc5ee7&is=65c9e9e7&hm=3b9a88db41103ef5393294c5eaeebb60ee2229f43724cc014d4cffc92de1f384&), which contains metadata about each dataset.

The strings in the columns need to be converted to vector representations (embedding) using Towhee [text_embedding.dpr operator](https://towhee.io/text-embedding/dpr). Columns containing these new embedings should contain the original column name with `_vector` at the end.

### NB In case pandas cannot read the csv, due to a delimiter parsing error

Use the code below to reformat the delimiters to "|", and replace the excess ones that replaced the commas inside sentences with regular commas.

In [18]:
# Cell for reformatting the delimiters to "|"
import re
import csv

def replace_delimiter(input_file, output_file):
    with open(input_file, 'r', encoding='utf-8') as file:
        content = file.read()

    # Regular expression to match commas not inside double quotes
    pattern = r',(?=(?:[^"]*"[^"]*")*[^"]*$)'

    # Replace the matched commas with '|'
    new_content = re.sub(pattern, '|', content)

    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(new_content)

# Replace this with your actual file paths
input_file = 'metadata.csv'
output_file = 'output_metadata_modified.csv'

replace_delimiter(input_file, output_file)


In [None]:
import pandas as pd
from towhee import pipe, ops, DataCollection

# Function to compute embeddings for a single text
def compute_embeddings(text):
    return DataCollection(embeddings_pipe(text)).to_list()[0]['vec']


df = pd.read_csv('output_metadata_modified.csv', delimiter='|', encoding='latin-1')

# Recasts 'title' column to string
recast_to_string = ['title', 'uuid']
df[recast_to_string] = df[recast_to_string].astype('object')

# Fill NaN values with an empty string
df.fillna('', inplace=True)

# Pipe converting text to embeddings (vectors)
embeddings_pipe = (
    pipe.input('text')
        .map('text', 'vec', ops.text_embedding.dpr(model_name='facebook/dpr-ctx_encoder-single-nq-base'))
        .output('text', 'vec')
)

# Process each column and create new columns for embeddings
columns_to_vectorise = [col for col in df.columns if col not in ['schema', 'uuid', 'id', 'datasetcreationdate', 'image', 'parentId']]
for column in columns_to_vectorise:
    print(f"Processing column: {column}")
    df[column + '_vector'] = df[column].apply(compute_embeddings)


# Display the dataframe
print(df.head())

In [None]:
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

server_host = 'ebjerk.no'
server_port = '19530'

connections.connect(host=server_host, port=server_port)

def kartverket_create_milvus_collection(collection_name, dim):
    if utility.has_collection(collection_name):
        utility.drop_collection(collection_name)
    
# "schema","uuid","id","hierarchyLevel","title",
# "datasetcreationdate","abstract","keyword","geoBox",
# "Constraints","SecurityConstraints","LegalConstraints","temporalExtent","image","responsibleParty","link","metadatacreationdate","productInformation","parentId"
    fields = [
            FieldSchema(name='schema', dtype=DataType.VARCHAR, max_length=50),   
            FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=False),
            FieldSchema(name='uuid', dtype=DataType.VARCHAR, max_length=100),
            FieldSchema(name='hierarchyLevel', dtype=DataType.VARCHAR, max_length=50),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   

            FieldSchema(name='datasetcreationdate', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='abstract', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='keyword', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   

            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   

            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500),   


            FieldSchema(name="title_vector", dtype=DataType.FLOAT_VECTOR, dim=dim),

            FieldSchema(name="link", dtype=DataType.VARCHAR, max_length=500),
            FieldSchema(name="reading_time", dtype=DataType.INT64),
            FieldSchema(name="publication", dtype=DataType.VARCHAR, max_length=500),
            FieldSchema(name="claps", dtype=DataType.INT64),
            FieldSchema(name="responses", dtype=DataType.INT64)
    ]
    schema = CollectionSchema(fields=fields, description='search text')
    collection = Collection(name=collection_name, schema=schema)
    
    index_params = {
        'metric_type': "L2",
        'index_type': "IVF_FLAT",
        'params': {"nlist": 2048}
    }
    collection.create_index(field_name='title_vector', index_params=index_params)
    return collection

kartverket_collection = kartverket_create_milvus_collection('kartverket_metadata', 768)

In [None]:
import pandas as pd

df = pd.read_csv('New_Medium_Data.csv', converters={'title_vector': lambda x: eval(x)})
df.head()

Unnamed: 0,id,title,title_vector,link,reading_time,publication,claps,responses
0,0,The Reported Mortality Rate of Coronavirus Is ...,"[0.041732933, 0.013779674, -0.027564144, -0.01...",https://medium.com/swlh/the-reported-mortality...,13,The Startup,1100,18
1,1,Dashboards in Python: 3 Advanced Examples for ...,"[0.0039737443, 0.003020432, -0.0006188639, 0.0...",https://medium.com/swlh/dashboards-in-python-3...,14,The Startup,726,3
2,2,How Can We Best Switch in Python?,"[0.031961977, 0.00047043373, -0.018263113, 0.0...",https://medium.com/swlh/how-can-we-best-switch...,6,The Startup,500,7
3,3,Maternity leave shouldn’t set women back,"[0.032572296, -0.011148319, -0.01688577, -0.00...",https://medium.com/swlh/maternity-leave-should...,9,The Startup,460,1
4,4,Python NLP Tutorial: Information Extraction an...,"[-0.011735886, -0.016938083, -0.027233299, 0.0...",https://medium.com/swlh/python-nlp-tutorial-in...,7,The Startup,163,0


## 2. Load Data

The next step is to get the text embedding, and then insert all the extracted embedding vectors into Milvus.

### Create Milvus Collection

We need to create a collection in Milvus first, which contains multiple fields of `id`, `title`, `title_vector`, `link`, `reading_time`, `publication`, `claps` and `responses`.

In [None]:
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

server_host = 'ebjerk.no'
server_port = '19530'

connections.connect(host=server_host, port=server_port)

def create_milvus_collection(collection_name, dim):
    if utility.has_collection(collection_name):
        utility.drop_collection(collection_name)
    
    fields = [
            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
            FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=500),   
            FieldSchema(name="title_vector", dtype=DataType.FLOAT_VECTOR, dim=dim),
            FieldSchema(name="link", dtype=DataType.VARCHAR, max_length=500),
            FieldSchema(name="reading_time", dtype=DataType.INT64),
            FieldSchema(name="publication", dtype=DataType.VARCHAR, max_length=500),
            FieldSchema(name="claps", dtype=DataType.INT64),
            FieldSchema(name="responses", dtype=DataType.INT64)
    ]
    schema = CollectionSchema(fields=fields, description='search text')
    collection = Collection(name=collection_name, schema=schema)
    
    index_params = {
        'metric_type': "L2",
        'index_type': "IVF_FLAT",
        'params': {"nlist": 2048}
    }
    collection.create_index(field_name='title_vector', index_params=index_params)
    return collection

collection = create_milvus_collection('search_article_in_medium', 768)

### Data to Milvus


Towhee supports reading df data through the `from_df` interface, and then we need to convert the `title_vector` column in the data to a two-dimensional list in float format, and then insert all the fields into Milvus, each field inserted into Milvus corresponds to one Collection fields created earlier.

In [None]:
from towhee import ops, pipe, DataCollection

insert_pipe = (pipe.input('df')
                   .flat_map('df', 'data', lambda df: df.values.tolist())
                   .map('data', 'res', ops.ann_insert.milvus_client(host=server_host, 
                                                                    port=server_port,
                                                                    collection_name='search_article_in_medium'))
                   .output('res')
)




In [None]:
%time _ = insert_pipe(df)

CPU times: user 18.2 s, sys: 2.53 s, total: 20.7 s
Wall time: 4min 9s


We need to call `collection.load()` to load the data after inserting the data, then run `collection.num_entities` to get the number of vectors in the collection. We will see the number of vectors is 5979, and we have successfully load the data to Milvus.

In [None]:
collection.load()
collection.num_entities

5979

## 3. Search embedding title

### Search one text in Milvus


The retrieval process also to generate the text embedding of the query text, then search for similar vectors in Milvus, and finally return the result, which contains `id`(primary_key) and `score`. For example, we can search for "funny python demo":

In [None]:
import numpy as np

search_pipe = (pipe.input('query')
                    .map('query', 'vec', ops.text_embedding.dpr(model_name="facebook/dpr-ctx_encoder-single-nq-base"))
                    .map('vec', 'vec', lambda x: x / np.linalg.norm(x, axis=0))
                    .flat_map('vec', ('id', 'score'), ops.ann_search.milvus_client(host=server_host, 
                                                                                   port=server_port,
                                                                                   collection_name='search_article_in_medium'))  
                    .output('query', 'id', 'score')
               )

res = search_pipe('funny python demo')
DataCollection(res).show()

query,id,score
funny python demo,3897,0.3737611174583435
funny python demo,1342,0.4368064999580383
funny python demo,1832,0.4572384059429168
funny python demo,5671,0.4593276083469391
funny python demo,1752,0.4645397365093231


### Search multi text in Milvus

We can also retrieve multiple pieces of data, for example we can specify the array(['funny python demo', 'AI in data analysis']) to search in batch, which will be retrieved in Milvus:

In [None]:
res = search_pipe.batch(['funny python demo', 'AI in data analysis'])
for re in res:
    DataCollection(re).show()

query,id,score
funny python demo,3897,0.3737611174583435
funny python demo,1342,0.4368064999580383
funny python demo,1832,0.4572384059429168
funny python demo,5671,0.4593276083469391
funny python demo,1752,0.4645397365093231


query,id,score
AI in data analysis,3493,0.2443668991327285
AI in data analysis,4542,0.2485119104385376
AI in data analysis,2649,0.284042477607727
AI in data analysis,4539,0.3186832070350647
AI in data analysis,3812,0.3224286139011383


### Search text and return multi fields

If we want to return more information when retrieving, we can set the `output_fields` parameter in [ann_search.milvus operator](https://towhee.io/ann-search/milvus). For example, in addition to `id` and `score`, we can also return `title`, `link`, `claps`, `reading_time`, `and response`:

In [None]:
search_pipe1 = (pipe.input('query')
                    .map('query', 'vec', ops.text_embedding.dpr(model_name="facebook/dpr-ctx_encoder-single-nq-base"))
                    .map('vec', 'vec', lambda x: x / np.linalg.norm(x, axis=0))
                    .flat_map('vec', ('id', 'score', 'title'), ops.ann_search.milvus_client(host=server_host, 
                                                                                   port=server_port,
                                                                                   collection_name='search_article_in_medium',
                                                                                   output_fields=['title']))  
                    .output('query', 'id', 'score', 'title')
               )

res = search_pipe1('funny python demo')
DataCollection(res).show()

query,id,score,title
funny python demo,3897,0.3737611174583435,Python Coding for Kids — Setting Up For the Adventure
funny python demo,1342,0.4368064999580383,How to Design Professional Venn Diagrams in Python
funny python demo,1832,0.4572384059429168,How to mock AWS services for rapid local development.
funny python demo,5671,0.4593276083469391,Adventure into Machine Learning using Python
funny python demo,1752,0.4645397365093231,Custom neural networks in Keras: a street fighter’s guide to build a graphCNN


In [None]:
# milvus search with multi output fields
search_pipe2 = (pipe.input('query')
                    .map('query', 'vec', ops.text_embedding.dpr(model_name="facebook/dpr-ctx_encoder-single-nq-base"))
                    .map('vec', 'vec', lambda x: x / np.linalg.norm(x, axis=0))
                    .flat_map('vec', ('id', 'score', 'title', 'link', 'reading_time', 'publication', 'claps', 'responses'), 
                                       ops.ann_search.milvus_client(host=server_host, 
                                                                    port=server_port,
                                                                    collection_name='search_article_in_medium',
                                                                    output_fields=['title', 'link', 'reading_time', 'publication', 'claps', 'responses'], 
                                                                    limit=5))  
                    .output('query', 'id', 'score', 'title', 'link', 'reading_time', 'publication', 'claps', 'responses')
               )

res = search_pipe2('funny python demo')
DataCollection(res).show()

query,id,score,title,link,reading_time,publication,claps,responses
funny python demo,3897,0.3737611174583435,Python Coding for Kids — Setting Up For the Adventure,https://medium.com/swlh/python-coding-for-kids-setting-up-for-the-adventure-9be4bef6b24e,14,The Startup,119,2
funny python demo,1342,0.4368064999580383,How to Design Professional Venn Diagrams in Python,https://towardsdatascience.com/how-to-design-professional-venn-diagrams-in-python-693c9ed2c288,6,Towards Data Science,97,1
funny python demo,1832,0.4572384059429168,How to mock AWS services for rapid local development.,https://medium.com/swlh/how-to-mock-aws-services-for-rapid-local-development-3d07581ffc3a,3,The Startup,84,0
funny python demo,5671,0.4593276083469391,Adventure into Machine Learning using Python,https://towardsdatascience.com/adventure-into-machine-learning-using-python-7a85fce81b7d,14,Towards Data Science,25,0
funny python demo,1752,0.4645397365093231,Custom neural networks in Keras: a street fighter’s guide to build a graphCNN,https://towardsdatascience.com/custom-neural-networks-in-keras-a-street-fighters-guide-to-build-a-graphcnn-e91f6b05f12e,7,Towards Data Science,55,0


### Search text with some expr


In addition, we can also set some expressions for retrieval. For example, we can specify that the beginning of the article is an article in Python by setting expr='title like "Python%"':

In [None]:
search_pipe3 = (pipe.input('query')
                    .map('query', 'vec', ops.text_embedding.dpr(model_name="facebook/dpr-ctx_encoder-single-nq-base"))
                    .map('vec', 'vec', lambda x: x / np.linalg.norm(x, axis=0))
                    .flat_map('vec', ('id', 'score', 'title', 'link', 'reading_time', 'publication', 'claps', 'responses'), 
                                       ops.ann_search.milvus_client(host=server_host, 
                                                                    port=server_port,
                                                                    collection_name='search_article_in_medium',
                                                                    expr='title like "Python%"',
                                                                    output_fields=['title', 'link', 'reading_time', 'publication', 'claps', 'responses'], 
                                                                    limit=5))  
                    .output('query', 'id', 'score', 'title', 'link', 'reading_time', 'publication', 'claps', 'responses')
               )

res = search_pipe3('funny python demo')
DataCollection(res).show()

query,id,score,title,link,reading_time,publication,claps,responses
funny python demo,3897,0.3737611174583435,Python Coding for Kids — Setting Up For the Adventure,https://medium.com/swlh/python-coding-for-kids-setting-up-for-the-adventure-9be4bef6b24e,14,The Startup,119,2
funny python demo,4644,0.4937489628791809,Python for Finance — The Complete Beginner’s Guide,https://towardsdatascience.com/python-for-finance-the-complete-beginners-guide-764276d74cef,8,Towards Data Science,292,5
funny python demo,2736,0.4956967830657959,Python for Beginners — Basics,https://towardsdatascience.com/python-for-beginners-basics-7ac6247bb4f4,7,Towards Data Science,11,0
funny python demo,1667,0.5019431114196777,Python — How to measure thread execution time in multithreaded application?,https://medium.com/swlh/python-how-to-measure-thread-execution-time-in-multithreaded-application-f4b2e2112091,6,The Startup,55,0
funny python demo,1298,0.5166990756988525,Python Testing with a mock database (SQL),https://medium.com/swlh/python-testing-with-a-mock-database-sql-68f676562461,4,The Startup,51,0


## 4. Query data in Milvus

We have done the text retrieval process before, and we can get articles such as "Python coding for kids - getting ready for an adventure" by retrieving "fun python demos".

We can also do a simple query on the data, we need to set `expr` and `output_fields` with the `collection.query` interface, for example, we can filter out articles with faults greater than 3000 and reading time less than 15 minutes, and submitted to TDS :

In [None]:
collection.query(
  expr = 'claps > 3000 && reading_time < 15 && publication like "Towards Data Science%"', 
  output_fields = ['id', 'title', 'link', 'reading_time', 'publication', 'claps', 'responses'],
  consistency_level='Strong'
)

[{'claps': 4400,
  'responses': 20,
  'id': 2572,
  'title': 'Top 3 Python Functions You Don’t Know About (Probably)',
  'link': 'https://towardsdatascience.com/top-3-python-functions-you-dont-know-about-probably-978f4be1e6d',
  'reading_time': 4,
  'publication': 'Towards Data Science'},
 {'claps': 3500,
  'responses': 8,
  'id': 4639,
  'title': 'Do You Know Python Has A Built-In Database?',
  'link': 'https://towardsdatascience.com/do-you-know-python-has-a-built-in-database-d553989c87bd',
  'reading_time': 6,
  'publication': 'Towards Data Science'},
 {'claps': 4600,
  'responses': 73,
  'id': 5766,
  'title': 'Machine Learning Engineers Will Not Exist In 10 Years.',
  'link': 'https://towardsdatascience.com/machine-learning-engineers-will-not-exist-in-10-years-c9cbbf4472f3',
  'reading_time': 6,
  'publication': 'Towards Data Science'},
 {'claps': 5200,
  'responses': 17,
  'id': 913,
  'title': 'I Thought I Was Mastering Python Until I Discovered These Tricks',
  'link': 'https://

## Demo of semantic search

In [None]:
# Variables specifying what column and collection to perform ANN comparrison against
vector_columns = ['title_vector']
collection_name = 'search_article_in_medium'

# What columns to return for view
response_output = ['title', 'link', 'reading_time', 'publication', 'claps', 'responses']


demo_pipe = (pipe.input('query')
                    .map('query', 'vec', ops.text_embedding.dpr(model_name="facebook/dpr-ctx_encoder-single-nq-base"))
                    .map('vec', 'vec', lambda x: x / np.linalg.norm(x, axis=0))
                    .flat_map('vec', ('id', 'score', 'title', 'link', 'reading_time', 'publication', 'claps', 'responses'), 
                                       ops.ann_search.milvus_client(host=server_host, 
                                                                    port=server_port,
                                                                    collection_name=collection_name,
                                                                    vector_field=vector_columns,
                                                                    output_fields=response_output, 
                                                                    limit=5))  
                    .output(*['query', 'score'], *response_output)
               )

print('\n"Just do it" search:')
res_semantic1 = demo_pipe('Just do it')
DataCollection(res_semantic1).show()

print('\n"Assemble" search:')
res_semantic2 = demo_pipe('Assemble')
DataCollection(res_semantic2).show()

print('\n"Show me how i can become a data analyst" search:')
res_semantic3 = demo_pipe('Show me how i can become a data analyst')
DataCollection(res_semantic3).show()


"Just do it" search:


query,score,title,link,reading_time,publication,claps,responses
Just do it,0.5165345072746277,Tune Into Your Body’s Rhythms to Create Your Best Writing Routine,https://medium.com/swlh/tune-into-your-bodys-rhythms-to-create-your-best-writing-routine-4421d97b897c,7,The Startup,96,0
Just do it,0.521692156791687,Get your idea out there!,https://medium.com/swlh/get-your-idea-out-there-d396b9443d2f,11,The Startup,134,0
Just do it,0.5453013181686401,Fundraising – Getting Your Mind Right,https://medium.com/swlh/fundraising-getting-your-mind-right-76e6864670b,4,The Startup,51,0
Just do it,0.5468644499778748,Think Like a Boss and You Will Become One,https://medium.com/swlh/think-like-a-boss-and-you-will-become-one-9236fc5c4b79,8,The Startup,292,1
Just do it,0.5481520295143127,You Can Learn How to Be Creative.,https://medium.com/swlh/you-can-learn-how-to-be-creative-f1894da4bac5,4,The Startup,89,2



"Assemble" search:


query,score,title,link,reading_time,publication,claps,responses
Assemble,0.5605491399765015,"Create A Synthetic Image Dataset — The “What”, The “Why” and The “How”",https://towardsdatascience.com/create-a-synthetic-image-dataset-the-what-the-why-and-the-how-f820e6b6f718,7,Towards Data Science,50,0
Assemble,0.5605491399765015,"Create A Synthetic Image Dataset — The “What”, The “Why” and The “How”",https://towardsdatascience.com/create-a-synthetic-image-dataset-the-what-the-why-and-the-how-f820e6b6f718,7,Towards Data Science,50,0
Assemble,0.5768184661865234,The Planning Process for Your Organization,https://medium.com/swlh/the-planning-process-for-your-organization-acb61c785dfd,4,The Startup,76,0
Assemble,0.5851483345031738,Preparing the data for Transformer pre-training — a write-up,https://towardsdatascience.com/preparing-the-data-for-transformer-pre-training-a-write-up-67a9dc0cae5a,3,Towards Data Science,36,0
Assemble,0.5889052152633667,Creating Async Vue Components,https://medium.com/swlh/creating-async-vue-components-f1c60050270f,3,The Startup,295,0



"Show me how i can become a data analyst" search:


query,score,title,link,reading_time,publication,claps,responses
Show me how i can become a data analyst,0.2777085900306701,How I see a lesson from Flash holds a future of prototyping,https://uxdesign.cc/how-i-see-a-lesson-from-flash-holds-a-future-of-prototyping-9ed1e939232d,11,UX Collective,63,0
Show me how i can become a data analyst,0.2872746884822845,Find your first job as a Data Scientist,https://towardsdatascience.com/find-your-first-job-as-a-data-scientist-81e4401fe5bf,5,Towards Data Science,248,0
Show me how i can become a data analyst,0.2886537313461303,What You’ll Learn in 1 Year as a Data Scientist,https://towardsdatascience.com/what-youll-learn-in-1-year-as-a-data-scientist-b69061639653,9,Towards Data Science,161,2
Show me how i can become a data analyst,0.2963539361953735,Why I love being a data scientist,https://towardsdatascience.com/why-i-love-being-a-data-scientist-b4e2de7292e7,6,Towards Data Science,183,1
Show me how i can become a data analyst,0.2991792261600494,Make Your Data Models Into Websites,https://towardsdatascience.com/make-your-data-models-into-websites-d7260956c6d7,6,Towards Data Science,95,0


In [None]:
# Search by questions

#question_0 = "How can modern software enhance the efficiency of complex computational tasks?"
question_1 = "What are the latest breakthroughs in machines understanding human speech?"
#question_2 = "In what ways can an individual improve their creative expression?"
#question_3 = "What are the key principles in creating a user-friendly digital interface?"
#question_4 = "What factors should entrepreneurs consider for successful business growth in a digital age?"
#question_5 = "What foundational skills are essential for analyzing large datasets effectively?"
#question_6 = "What should newcomers understand before investing in cryptocurrency?"
question_7 = "How does predictive modeling transform decision-making in industries?"
#question_8 = "What strategies are crucial for a brand to stand out in a competitive market?"
question_9 = "How can a company cultivate a culture of trust and innovation among its employees?"

print(f'\n"{question_1}" search:')
res_question1 = demo_pipe(question_1)
DataCollection(res_question1).show()

print(f'\n"{question_7}" search:')
res_question2 = demo_pipe(question_7)
DataCollection(res_question2).show()

print(f'\n"{question_9}" search:')
res_question3 = demo_pipe(question_9)
DataCollection(res_question3).show()


"What are the latest breakthroughs in speech recognition for machines?" search:


query,score,title,link,reading_time,publication,claps,responses
What are the latest breakthroughs in speech recognition for machines?,0.2219657897949218,What do various countries’ healthcare capacities look like?,https://towardsdatascience.com/what-do-various-countries-healthcare-capacities-look-like-1581896a0601,8,Towards Data Science,1400,15
What are the latest breakthroughs in speech recognition for machines?,0.2360749095678329,What can we do to humanise our user’s experience?,https://uxdesign.cc/what-can-we-do-to-humanise-our-users-experience-98f6fda33609,5,UX Collective,31,0
What are the latest breakthroughs in speech recognition for machines?,0.2380344867706298,What are the Benefits and Barriers of Big Data Analytics in Controlling?,https://towardsdatascience.com/data-science-in-the-real-world-d53af9ba7230,7,Towards Data Science,40,0
What are the latest breakthroughs in speech recognition for machines?,0.2490801960229873,Why are neural networks so powerful?,https://towardsdatascience.com/why-are-neural-networks-so-powerful-bc308906696c,8,Towards Data Science,380,1
What are the latest breakthroughs in speech recognition for machines?,0.2508961856365204,Will Coding Be Useless After Artificial Intelligence Can Write Flawless Code?,https://towardsdatascience.com/will-coding-be-useless-after-artificial-intelligence-can-write-flawless-code-e2187c151a3d,4,Towards Data Science,65,0



"How does predictive modeling transform decision-making in industries?" search:


query,score,title,link,reading_time,publication,claps,responses
How does predictive modeling transform decision-making in industries?,0.1813471913337707,How does data science create value for firms?,https://towardsdatascience.com/how-does-data-science-create-value-for-firms-a3e3e5ca86e3,19,Towards Data Science,24,0
How does predictive modeling transform decision-making in industries?,0.19828462600708,How can Machine Learning algorithms include better Causality?,https://medium.com/swlh/how-can-machine-learning-algorithms-include-better-causality-e869ca60e54d,9,The Startup,437,2
How does predictive modeling transform decision-making in industries?,0.19828462600708,How can Machine Learning algorithms include better Causality?,https://medium.com/swlh/how-can-machine-learning-algorithms-include-better-causality-e869ca60e54d,9,The Startup,437,2
How does predictive modeling transform decision-making in industries?,0.202661782503128,How to perform Data Analysis using the CRISP-DM approach?,https://towardsdatascience.com/how-to-perform-data-analysis-using-the-crisp-dm-approach-201708f220b2,6,Towards Data Science,26,0
How does predictive modeling transform decision-making in industries?,0.2252297401428222,Can machine learning help build a better stock portfolio?,https://towardsdatascience.com/can-machine-learning-help-build-a-better-stock-portfolio-8e4b3334a49,8,Towards Data Science,180,1



"How can a company cultivate a culture of trust and innovation among its employees?" search:


query,score,title,link,reading_time,publication,claps,responses
How can a company cultivate a culture of trust and innovation among its employees?,0.1816233992576599,How to Build an Outstanding Company Culture?,https://medium.com/swlh/a-human-oriented-framework-to-build-a-great-company-culture-d97ff49e6766,4,The Startup,45,0
How can a company cultivate a culture of trust and innovation among its employees?,0.1916501969099044,How Can Organizations Learn Effectively?,https://medium.com/swlh/the-keys-to-organizational-learning-9ba46bbcd7bc,7,The Startup,208,1
How can a company cultivate a culture of trust and innovation among its employees?,0.2152410745620727,Why is a clear value proposition essential for any startup or growing business?,https://uxdesign.cc/why-is-a-clear-value-proposition-essential-for-any-startup-or-growing-business-f0fce3446a3f,3,UX Collective,83,0
How can a company cultivate a culture of trust and innovation among its employees?,0.2346673756837844,How does data science create value for firms?,https://towardsdatascience.com/how-does-data-science-create-value-for-firms-a3e3e5ca86e3,19,Towards Data Science,24,0
How can a company cultivate a culture of trust and innovation among its employees?,0.2560636699199676,What Makes a Social Media Campaign Innovative?,https://medium.com/swlh/what-makes-a-social-media-campaign-innovative-2f65d8c51ab,4,The Startup,51,0
