Example implementation of parallel decoding from Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding.
This approach seems to be effective for queries where the query has independent subcomponents. For example, in What are some practical tips for individuals to reduce their carbon emissions? each practical tip is independent, so you first have an LLM return a skeleton like:
1. Energy conservation.
2. Efficient transportation.
3. Home energy efficiency.
4. Reduce water consumption.
5. Sustainable diet.
6. Sustainable travel.
Then you send a parallel request for each subcomponent with one request per "point".
You could also imagine this working well for code, where you first ask for a skeleton of the function interfaces. Then you dispatch a request to write the internals of each function. This assumes the functions don't depend on each other in any way (no global state).
However, this approach performs poorly on math and Fermi estimates where you want chain-of-thought reasoning.
- Install Poetry if it's not already installed:
curl -sSL https://install.python-poetry.org | bash - Install project dependencies with Poetry:
poetry install - Activate Poetry's shell to use the project-specific virtual environment:
poetry shell - Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY='your-api-key' python3 parallel_decoding.py