fix: re-order JSON and CSV in the lesson about saving data (Python course) #1658

honzajavorek · 2025-06-27T14:13:21Z

When working on #1584 I realized it'd be better if the lesson started with JSON and continued with CSV, not the other way.

In Python it doesn't matter and in JavaScript it's easier to start with JSON, which is built-in, and only then move to CSV, which requires an additional library. So for the sake of having both lessons aligned, I want to change the order in the Python lesson, too.

So most of the diff is just the two sections reversed, and the two exercises reversed. I made only a few additional changes to the wording.

Making this change because in Python it doesn't matter and in JavaScript it's easier to start with JSON, which is built-in, and only then move to CSV, which requires an additional library.

cursor

Bug: Module Import Timing and Export Instructions Mismatch

The lesson contains two main instructional inconsistencies:

The csv module is prematurely imported in the JSON section's code example, appearing before the CSV format is introduced, which can confuse students.
The instructions for adding data exports are contradictory: the JSON section tells users to "replace" the print(data) line, while the CSV section later says to "add one more data export", creating ambiguity about whether exports should coexist or replace each other.

sources/academy/webscraping/scraping_basics_python/08_saving_data.md#L86-L186

apify-docs/sources/academy/webscraping/scraping_basics_python/08_saving_data.md

Lines 86 to 186 in a8e57c7

    
           ```py 
        
           import httpx 
        
           from bs4 import BeautifulSoup 
        
           from decimal import Decimal 
        
           import csv 
        
           # highlight-next-line 
        
           import json 
        
           ``` 
        
           Next, instead of printing the data, we'll finish the program by exporting it to JSON. Let's replace the line `print(data)` with the following: 
        
           ```py 
        
           with open("products.json", "w") as file: 
        
               json.dump(data, file) 
        
           ``` 
        
           That's it! If we run the program now, it should also create a `products.json` file in the current working directory: 
        
           ```text 
        
           $ python main.py 
        
           Traceback (most recent call last): 
        
             ... 
        
               raise TypeError(f'Object of type {o.__class__.__name__} ' 
        
           TypeError: Object of type Decimal is not JSON serializable 
        
           ``` 
        
           Ouch! JSON supports integers and floating-point numbers, but there's no guidance on how to handle `Decimal`. To maintain precision, it's common to store monetary values as strings in JSON files. But this is a convention, not a standard, so we need to handle it manually. We'll pass a custom function to `json.dump()` to serialize objects that it can't handle directly: 
        
           ```py 
        
           def serialize(obj): 
        
               if isinstance(obj, Decimal): 
        
                   return str(obj) 
        
               raise TypeError("Object not JSON serializable") 
        
           with open("products.json", "w") as file: 
        
               json.dump(data, file, default=serialize) 
        
           ``` 
        
           If we run our scraper now, it won't display any output, but it will create a `products.json` file in the current working directory, which contains all the data about the listed products: 
        
           <!-- eslint-skip --> 
        
           ```json title=products.json 
        
           [{"title": "JBL Flip 4 Waterproof Portable Bluetooth Speaker", "min_price": "74.95", "price": "74.95"}, {"title": "Sony XBR-950G BRAVIA 4K HDR Ultra HD TV", "min_price": "1398.00", "price": null}, ...] 
        
           ``` 
        
           If you skim through the data, you'll notice that the `json.dump()` function handled some potential issues, such as escaping double quotes found in one of the titles by adding a backslash: 
        
           ```json 
        
           {"title": "Sony SACS9 10\" Active Subwoofer", "min_price": "158.00", "price": "158.00"} 
        
           ``` 
        
           :::tip Pretty JSON 
        
           While a compact JSON file without any whitespace is efficient for computers, it can be difficult for humans to read. You can pass `indent=2` to `json.dump()` for prettier output. 
        
           Also, if your data contains non-English characters, set `ensure_ascii=False`. By default, Python encodes everything except [ASCII](https://en.wikipedia.org/wiki/ASCII), which means it would save [Bún bò Nam Bô](https://vi.wikipedia.org/wiki/B%C3%BAn_b%C3%B2_Nam_B%E1%BB%99) as `B\\u00fan b\\u00f2 Nam B\\u00f4`. 
        
           ::: 
        
           ## Saving data as CSV 
        
           The CSV format is popular among data analysts because a wide range of tools can import it, including spreadsheets apps like LibreOffice Calc, Microsoft Excel, Apple Numbers, and Google Sheets. 
        
           In Python, we can read and write CSV using the [`csv`](https://docs.python.org/3/library/csv.html) standard library module. First let's try something small in the Python's interactive REPL to familiarize ourselves with the basic usage: 
        
           ```py 
        
           >>> import csv 
        
           >>> with open("data.csv", "w") as file: 
        
           ...     writer = csv.DictWriter(file, fieldnames=["name", "age", "hobbies"]) 
        
           ...     writer.writeheader() 
        
           ...     writer.writerow({"name": "Alice", "age": 24, "hobbies": "kickbox, Python"}) 
        
           ...     writer.writerow({"name": "Bob", "age": 42, "hobbies": "reading, TypeScript"}) 
        
           ... 
        
           ``` 
        
           We first opened a new file for writing and created a `DictWriter()` instance with the expected field names. We instructed it to write the header row first and then added two more rows containing actual data. The code produced a `data.csv` file in the same directory where we're running the REPL. It has the following contents: 
        
           ```csv title=data.csv 
        
           name,age,hobbies 
        
           Alice,24,"kickbox, Python" 
        
           Bob,42,"reading, TypeScript" 
        
           ``` 
        
           In the CSV format, if a value contains commas, we should enclose it in quotes. When we open the file in a text editor of our choice, we can see that the writer automatically handled this. 
        
           When browsing the directory on macOS, we can see a nice preview of the file's contents, which proves that the file is correct and that other programs can read it. If you're using a different operating system, try opening the file with any spreadsheet program you have. 
        
           ![CSV example preview](images/csv-example.png) 
        
           Now that's nice, but we didn't want Alice, Bob, kickbox, or TypeScript. What we actually want is a CSV containing `Sony XBR-950G BRAVIA 4K HDR Ultra HD TV`, right? Let's do this! First, let's add `csv` to our imports: 
        
           ```py 
        
           import httpx 
        
           from bs4 import BeautifulSoup 
        
           from decimal import Decimal 
        
           # highlight-next-line 
        
           import csv 
        
           ``` 
        
           Next, let's add one more data export to end of the source code of our scraper:

Fix in Cursor

Comment bugbot run to trigger another review on this PR
Was this report helpful? Give feedback by reacting with 👍 or 👎

apify-service-account · 2025-06-27T14:16:02Z

Preview for this PR was built for commit a8e57c7 and is ready at https://pr-1658.preview.docs.apify.com!

honzajavorek added 2 commits June 27, 2025 16:07

style: change order, first json, then csv

153d776

Making this change because in Python it doesn't matter and in JavaScript it's easier to start with JSON, which is built-in, and only then move to CSV, which requires an additional library.

fix: various improvements to the Python lesson about saving data

a8e57c7

honzajavorek requested a review from TC-MO June 27, 2025 14:13

honzajavorek added the t-academy Issues related to Web Scraping and Apify academies. label Jun 27, 2025

cursor bot reviewed Jun 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: re-order JSON and CSV in the lesson about saving data (Python course) #1658

fix: re-order JSON and CSV in the lesson about saving data (Python course) #1658

Uh oh!

honzajavorek commented Jun 27, 2025

Uh oh!

cursor bot left a comment

Uh oh!

apify-service-account commented Jun 27, 2025

Uh oh!

Uh oh!


	```py
	import httpx
	from bs4 import BeautifulSoup
	from decimal import Decimal
	import csv
	# highlight-next-line
	import json
	```

	Next, instead of printing the data, we'll finish the program by exporting it to JSON. Let's replace the line `print(data)` with the following:

	```py
	with open("products.json", "w") as file:
	json.dump(data, file)
	```

	That's it! If we run the program now, it should also create a `products.json` file in the current working directory:

	```text
	$ python main.py
	Traceback (most recent call last):
	...
	raise TypeError(f'Object of type {o.__class__.__name__} '
	TypeError: Object of type Decimal is not JSON serializable
	```

	Ouch! JSON supports integers and floating-point numbers, but there's no guidance on how to handle `Decimal`. To maintain precision, it's common to store monetary values as strings in JSON files. But this is a convention, not a standard, so we need to handle it manually. We'll pass a custom function to `json.dump()` to serialize objects that it can't handle directly:

	```py
	def serialize(obj):
	if isinstance(obj, Decimal):
	return str(obj)
	raise TypeError("Object not JSON serializable")

	with open("products.json", "w") as file:
	json.dump(data, file, default=serialize)
	```

	If we run our scraper now, it won't display any output, but it will create a `products.json` file in the current working directory, which contains all the data about the listed products:

	<!-- eslint-skip -->
	```json title=products.json
	[{"title": "JBL Flip 4 Waterproof Portable Bluetooth Speaker", "min_price": "74.95", "price": "74.95"}, {"title": "Sony XBR-950G BRAVIA 4K HDR Ultra HD TV", "min_price": "1398.00", "price": null}, ...]
	```

	If you skim through the data, you'll notice that the `json.dump()` function handled some potential issues, such as escaping double quotes found in one of the titles by adding a backslash:

	```json
	{"title": "Sony SACS9 10\" Active Subwoofer", "min_price": "158.00", "price": "158.00"}
	```

	:::tip Pretty JSON

	While a compact JSON file without any whitespace is efficient for computers, it can be difficult for humans to read. You can pass `indent=2` to `json.dump()` for prettier output.

	Also, if your data contains non-English characters, set `ensure_ascii=False`. By default, Python encodes everything except [ASCII](https://en.wikipedia.org/wiki/ASCII), which means it would save [Bún bò Nam Bô](https://vi.wikipedia.org/wiki/B%C3%BAn_b%C3%B2_Nam_B%E1%BB%99) as `B\\u00fan b\\u00f2 Nam B\\u00f4`.

	:::

	## Saving data as CSV

	The CSV format is popular among data analysts because a wide range of tools can import it, including spreadsheets apps like LibreOffice Calc, Microsoft Excel, Apple Numbers, and Google Sheets.

	In Python, we can read and write CSV using the [`csv`](https://docs.python.org/3/library/csv.html) standard library module. First let's try something small in the Python's interactive REPL to familiarize ourselves with the basic usage:

	```py
	>>> import csv
	>>> with open("data.csv", "w") as file:
	... writer = csv.DictWriter(file, fieldnames=["name", "age", "hobbies"])
	... writer.writeheader()
	... writer.writerow({"name": "Alice", "age": 24, "hobbies": "kickbox, Python"})
	... writer.writerow({"name": "Bob", "age": 42, "hobbies": "reading, TypeScript"})
	...
	```

	We first opened a new file for writing and created a `DictWriter()` instance with the expected field names. We instructed it to write the header row first and then added two more rows containing actual data. The code produced a `data.csv` file in the same directory where we're running the REPL. It has the following contents:

	```csv title=data.csv
	name,age,hobbies
	Alice,24,"kickbox, Python"
	Bob,42,"reading, TypeScript"
	```

	In the CSV format, if a value contains commas, we should enclose it in quotes. When we open the file in a text editor of our choice, we can see that the writer automatically handled this.

	When browsing the directory on macOS, we can see a nice preview of the file's contents, which proves that the file is correct and that other programs can read it. If you're using a different operating system, try opening the file with any spreadsheet program you have.

	![CSV example preview](images/csv-example.png)

	Now that's nice, but we didn't want Alice, Bob, kickbox, or TypeScript. What we actually want is a CSV containing `Sony XBR-950G BRAVIA 4K HDR Ultra HD TV`, right? Let's do this! First, let's add `csv` to our imports:

	```py
	import httpx
	from bs4 import BeautifulSoup
	from decimal import Decimal
	# highlight-next-line
	import csv
	```

	Next, let's add one more data export to end of the source code of our scraper:

fix: re-order JSON and CSV in the lesson about saving data (Python course) #1658

Are you sure you want to change the base?

fix: re-order JSON and CSV in the lesson about saving data (Python course) #1658

Uh oh!

Conversation

honzajavorek commented Jun 27, 2025

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Module Import Timing and Export Instructions Mismatch

Uh oh!

apify-service-account commented Jun 27, 2025

Uh oh!

Uh oh!