Homepage: https://csed.zheqiaoc.com
China Social Event Database (CSED) is a timeline-based event aggregation and analysis tool designed to record daily social dynamics and online public opinion.
The development of China Social Event Database (CSED) stems from two questions:
- What is happening in China every day?
- What information are people receiving on the Chinese internet?
Most social media or political science studies seem to focus more on specific events rather than the overall distribution of information. Therefore, I hope to aggregate and analyze data at the event level through this project.
- Automatically aggregates information daily and displays it in a timeline format.
Government response detection, entries with yellow stars contain government responses.(This feature exists in code but no longer displayed in frontend)- Click on post titles to jump to original Weibo posts.
- Good support for both mobile and desktop platforms.
- Provide data download page or API interface.
- Add more features like event classification, event mapping, etc.
- Add more data sources like WeChat Official Accounts, Douyin, etc.
Open source (Sorry the code quality isn't great, didn't want to open source before thorough review)
Four preparations are needed before deployment:
- Find a Weibo crawler software and complete related configuration. I use weibo-crawler. Configure userlist and config.json and crawl the data you need
- Install Node.js and npm
- Install MongoDB
- Get your own OpenAI API key
# Clone the repository
git clone https://github.com/zheqiaochen/China-Social-Event-Database-CSED.git
# Enter the project directory
cd China-Social-Event-Database-CSED
# Install dependencies
uv sync
pnpm installCreate a .env file in the root directory and configure the mongodb address (default is mongodb://localhost:27017) and openai api key, format as follows:
MONGO_URI=mongodb://localhost:27017/
API_KEY=sk-...
# Start backend server
python -m backend.main
# Run the following commands in order
# Summary
curl -X POST http://0.0.0.0:8888/api/process/summary
# Embedding
curl -X POST http://0.0.0.0:8888/api/process/embedding
# Clustering
curl -X POST http://0.0.0.0:8888/api/process/cluster/hdbscan
# Generate cluster titles
curl -X POST http://0.0.0.0:8888/api/process/cluster/titles
# Run the following command to delete data (default deletes data older than 7 days that failed to be clustered)
curl -X POST http://0.0.0.0:8888/api/process/delete_old
# Run the following command to archive inactive events (default archives events inactive for more than 7 days)
curl -X POST http://0.0.0.0:8888/api/process/archive_inactive_events
# After running, you can start the frontend to see the effect
pnpm devI am currently a student and do not have enough time to maintain and develop this project. If you are interested, feel free to drop me an email. You can find the contact information on this page: About.
