<a href="https://colab.research.google.com/github/vanyaagarwal29/Python-Basics/blob/main/Python_Basics_Assignment_12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In the `PdfFileReader()` and `PdfFileWriter()` classes from the PyPDF2 library (a library for working with PDF files in Python), you don't need to explicitly open file objects using Python's built-in `open()` function. Instead, you pass the file objects directly as arguments to these classes.

Here's how you would typically use `PdfFileReader()` and `PdfFileWriter()`:

1. PdfFileReader():
This class is used to read and extract information from existing PDF files. You should pass the PDF file object opened in binary read mode ('rb').

```python
from PyPDF2 import PdfFileReader

with open('input.pdf', 'rb') as file:
    pdf_reader = PdfFileReader(file)
    # You can now use pdf_reader to access information about the PDF.
```

2. PdfFileWriter():
This class is used to create a new PDF file or modify an existing one. You should pass the PDF file object opened in binary write mode ('wb').

```python
from PyPDF2 import PdfFileWriter

with open('output.pdf', 'wb') as file:
    pdf_writer = PdfFileWriter()
    # You can now use pdf_writer to add pages, annotations, etc. and save the changes to 'output.pdf'.
```

Remember to open the file objects in binary mode ('rb' for reading and 'wb' for writing) when working with PDF files because PDFs are binary files, and using text mode could lead to issues.

Additionally, please note that as of my last update in September 2021, PyPDF2 was a commonly used library for basic PDF manipulation in Python. However, there might be other libraries or newer versions of PyPDF2 available now. Always check the latest documentation for any updates or changes.

To get a `Page` object for page 5 from a `PdfFileReader` object in the PyPDF2 library, you can use the `getPage()` method. The page numbering in PyPDF2 starts from 0, so to get page 5, you need to pass the index 4 to the `getPage()` method (0-indexed).

Here's how you can do it:

```python
from PyPDF2 import PdfFileReader

with open('input.pdf', 'rb') as file:
    pdf_reader = PdfFileReader(file)

    # Getting the Page object for page 5 (0-indexed, so pass index 4)
    page_number = 4  # Page 5 corresponds to index 4
    page = pdf_reader.getPage(page_number)

    # Now you can work with the Page object, for example, get its content or extract text from it.
    page_text = page.extractText()
    print(page_text)
```

Make sure to replace `'input.pdf'` with the path to your actual PDF file. After obtaining the `Page` object, you can perform various operations like extracting text, adding annotations, merging, etc., depending on your use case.

In the PyPDF2 library, the number of pages in a PDF document can be obtained using the `getNumPages()` method of the `PdfFileReader` class. This method returns an integer representing the total number of pages in the PDF file.

Here's how you can use it:

```python
from PyPDF2 import PdfFileReader

with open('input.pdf', 'rb') as file:
    pdf_reader = PdfFileReader(file)

    # Get the number of pages in the PDF document
    num_pages = pdf_reader.getNumPages()

    print(f"The PDF contains {num_pages} pages.")
```

Replace `'input.pdf'` with the path to your actual PDF file. After running this code, it will print the total number of pages in the specified PDF document.

If a `PdfFileReader` object's PDF is encrypted with the password "swordfish," you need to provide the password before you can obtain Page objects from it. This is necessary to unlock the encrypted PDF and access its content.

To do this, you need to use the `decrypt()` method of the `PdfFileReader` class and pass the password as an argument.

Here's how you can do it:

```python
from PyPDF2 import PdfFileReader

with open('encrypted_input.pdf', 'rb') as file:
    pdf_reader = PdfFileReader(file)

    # Provide the password to decrypt the PDF (replace 'swordfish' with the actual password)
    password = 'swordfish'
    pdf_reader.decrypt(password)

    # Now you can obtain Page objects from the decrypted PDF
    num_pages = pdf_reader.getNumPages()
    print(f"The PDF contains {num_pages} pages.")

    # For example, let's get the Page object for page 5 (0-indexed, so pass index 4)
    page_number = 4  # Page 5 corresponds to index 4
    page = pdf_reader.getPage(page_number)

    # Now you can work with the Page object, for example, get its content or extract text from it.
    page_text = page.extractText()
    print(page_text)
```

Replace `'encrypted_input.pdf'` with the path to your actual encrypted PDF file. After running this code and providing the correct password, it will decrypt the PDF and allow you to obtain Page objects and perform other operations on the PDF content. If you provide the wrong password or no password when the PDF is encrypted, you will get an error when trying to access the content.

To rotate a page in a PDF using the PyPDF2 library in Python, you can use the `rotateClockwise()` or `rotateCounterClockwise()` methods of the `Page` object. These methods allow you to rotate the page content by 90 degrees clockwise or counter-clockwise, respectively.

Here's how you can use these methods:

```python
from PyPDF2 import PdfFileReader, PdfFileWriter

# Open the PDF file and create PdfFileReader and PdfFileWriter objects
with open('input.pdf', 'rb') as file:
    pdf_reader = PdfFileReader(file)
    pdf_writer = PdfFileWriter()

    # Choose the page number you want to rotate (0-indexed, so page 1 is index 0)
    page_number = 0

    # Get the selected page from the PdfFileReader
    page = pdf_reader.getPage(page_number)

    # Rotate the page clockwise (90 degrees)
    page.rotateClockwise(90)

    # Alternatively, you can rotate the page counter-clockwise (90 degrees)
    # page.rotateCounterClockwise(90)

    # Add the rotated page to the PdfFileWriter
    pdf_writer.addPage(page)

    # Save the rotated page to a new PDF file
    with open('output.pdf', 'wb') as output_file:
        pdf_writer.write(output_file)
```

In this example, we open the input PDF file and create `PdfFileReader` and `PdfFileWriter` objects. We choose the page number we want to rotate (0-indexed, so page 1 is index 0). We then retrieve the selected page using `getPage()` from the `PdfFileReader` object. Next, we rotate the page clockwise using `rotateClockwise()` or counter-clockwise using `rotateCounterClockwise()`. Finally, we add the rotated page to the `PdfFileWriter` object and save the changes to a new PDF file using `write()`.

Remember to replace `'input.pdf'` with the path to your actual PDF file, and the rotated page will be saved as `'output.pdf'`.

The concepts of "Run" and "Paragraph" objects are commonly associated with word processing software or libraries that provide tools for document manipulation. Let's explore their definitions and differences:

1. Run Object:
In the context of word processing software like Microsoft Word or libraries like Python's `python-docx`, a "Run" represents a continuous range of characters within a paragraph that share the same character formatting. Character formatting includes attributes like font size, font style (bold, italic, etc.), font color, and more. If you change the formatting of a run, it only affects the characters within that specific run.

For example, consider the following sentence:
"Please make **bold** and *italic* text."

In this sentence, "Please make," "bold," and "italic" could each be separate runs with different character formatting. If you change the font color of the "bold" run, it won't affect the rest of the sentence.

2. Paragraph Object:
A "Paragraph" is a block of text that typically represents a logical unit of content. It is a collection of one or more runs of text. A paragraph can contain multiple runs with different formatting, or it can have just one run with uniform formatting.

Using the same example as above, the entire sentence would be contained within a single paragraph, and each segment (i.e., "Please make," "bold," "and," "italic," "text") would be separate runs within that paragraph.

Summary of Differences:
- A "Run" is a continuous range of characters with the same character formatting.
- A "Paragraph" is a logical unit of content containing one or more runs.
- A paragraph can consist of multiple runs with different formatting, whereas a run only contains characters with identical formatting.
- Modifying the formatting of a run affects only the characters within that run, while modifying the paragraph's formatting could affect all the runs within that paragraph.

It's important to note that the terminology may vary depending on the word processing software or library being used, but the basic concepts of runs and paragraphs generally remain consistent.

To obtain a list of Paragraph objects from a Document object stored in a variable named `doc`, it depends on the specific word processing library or format you are using. Since there are various word processing libraries with different approaches, I'll provide examples for two popular ones: `python-docx` and `python-docx2txt`.

1. Using python-docx:
`python-docx` is a library that allows you to work with Microsoft Word `.docx` files in Python. To get a list of Paragraph objects from a Document object, you can use the `paragraphs` attribute of the Document class.

Here's an example:

```python
from docx import Document

# Assuming you have already loaded the .docx file into the `doc` variable
# For example, doc = Document('your_file.docx')

# Get a list of Paragraph objects from the Document object
paragraphs_list = doc.paragraphs

# Now you can work with the list of Paragraph objects
for paragraph in paragraphs_list:
    print(paragraph.text)
```

2. Using python-docx2txt:
`python-docx2txt` is a library that allows you to extract text from Microsoft Word `.docx` files. It doesn't provide direct access to Paragraph objects but extracts the text content as a string.

Here's an example:

```python
import docx2txt

# Assuming you have already loaded the .docx file into the `doc` variable
# For example, doc = docx2txt.process('your_file.docx')

# Extract text from the Document object as a string
doc_text = doc

# Now you can process the text content as needed
print(doc_text)
```

In the second example, the `docx2txt.process()` function extracts the text content as a string, and you won't have direct access to Paragraph objects like you would with `python-docx`.

Choose the library that best suits your needs, and use the provided approach to obtain the content of the Document as a list of Paragraph objects or as plain text, depending on your requirements.

The type of object that typically has attributes such as `bold`, `underline`, `italic`, `strike`, and `outline` is a "Run" object in the context of word processing libraries or frameworks. Specifically, this is common when working with libraries that allow you to manipulate document formats like Microsoft Word's `.docx` files through Python.

Here's a brief explanation of each attribute:

- `bold`: A Boolean attribute that indicates whether the characters in the run are formatted as bold (`True`) or not (`False`).
- `underline`: A Boolean attribute that indicates whether the characters in the run are underlined (`True`) or not (`False`).
- `italic`: A Boolean attribute that indicates whether the characters in the run are formatted as italic (`True`) or not (`False`).
- `strike`: A Boolean attribute that indicates whether the characters in the run are strikethrough (`True`) or not (`False`).
- `outline`: A Boolean attribute that indicates whether the characters in the run have an outline style (`True`) or not (`False`).

Using Python libraries like `python-docx` to work with `.docx` files, you can access these attributes on the `Run` object to control the character formatting within the document.

Here's a simple example of how you might use these attributes:

```python
from docx import Document

doc = Document()
paragraph = doc.add_paragraph()

# Add text with various character formatting to the paragraph
run = paragraph.add_run('This is some sample text.')
run.bold = True
run.italic = True
run.underline = True
run.strike = True
run.outline = True

doc.save('formatted_document.docx')
```

In this example, we create a new document, add a paragraph, and add a "Run" object with formatted text. The text will be bold, italic, underlined, strikethrough, and have an outline style when saved in the `formatted_document.docx` file.

Keep in mind that the availability of these attributes and their specific usage might vary depending on the word processing library or format you are working with. The example provided here is based on the `python-docx` library for handling Microsoft Word `.docx` files.

In the context of character formatting in word processing libraries, such as `python-docx`, the values `False`, `True`, and `None` for the `bold` variable indicate different states of character boldness:

1. `False`: Setting `bold` to `False` means that the characters within the "Run" object (i.e., the specified text) are not formatted as bold. The text appears with the default or regular font weight.

Example:
```python
run = paragraph.add_run('This is normal text.')
run.bold = False
```

2. `True`: Setting `bold` to `True` means that the characters within the "Run" object are formatted as bold. The text appears with a bold font weight.

Example:
```python
run = paragraph.add_run('This is bold text.')
run.bold = True
```

3. `None`: When the `bold` attribute of a "Run" object is set to `None`, it means that the boldness of the characters is not explicitly specified at the run level. In this case, the text will inherit the boldness setting from its parent paragraph, style, or document level. If no explicit boldness is set at any of these levels, the text will be displayed with the default or regular font weight.

Example:
```python
run = paragraph.add_run('This is text with inherited boldness.')
run.bold = None  # Inherited boldness from the parent paragraph, style, or document level
```

By using `False`, `True`, or `None`, you can control the boldness of specific text within your document, applying different formatting as needed. Remember that the behavior might vary depending on the specific word processing library or document format being used.

To create a new Word document and obtain a `Document` object using the `python-docx` library, you can follow these steps:

1. Install the `python-docx` library (if you haven't already):
You can install the library using `pip` by running the following command in your terminal or command prompt:

```bash
pip install python-docx
```

2. Import the necessary module and create the `Document` object:
Once you have the `python-docx` library installed, you can import the required module and create a new `Document` object to work with the Word document.

Here's a simple example:

```python
from docx import Document

# Create a new Document object
doc = Document()

# Add content to the document (optional)
doc.add_heading('Title', level=1)
doc.add_paragraph('This is the first paragraph.')
doc.add_paragraph('This is the second paragraph.')

# Save the document to a file (optional)
doc.save('new_document.docx')
```

In this example, we import the `Document` class from the `docx` module and create a new `Document` object using `doc = Document()`. After that, we add some content to the document (heading and paragraphs), but this step is optional, depending on your specific requirements. Finally, we save the document to a file named `'new_document.docx'` using `doc.save()`.

If you don't need to save the document to a file and only want to work with the in-memory representation of the document, you can skip the `doc.save()` step.

That's it! You now have a new Word document represented by the `Document` object, and you can add content or perform various manipulations using the `python-docx` library.

To add a paragraph with the text `'Hello, there!'` to a `Document` object stored in a variable named `doc`, you can use the `add_paragraph()` method of the `Document` class. Here's how you can do it:

```python
from docx import Document

# Assuming you have already created a Document object and stored it in the variable 'doc'
# For example, doc = Document()

# Add a paragraph with the text 'Hello, there!' to the Document
text_to_add = 'Hello, there!'
doc.add_paragraph(text_to_add)

# Optionally, you can also save the Document to a file
doc.save('output_document.docx')
```

In this example, we use the `add_paragraph()` method to add a new paragraph to the `Document` object. The `text_to_add` variable contains the text `'Hello, there!'`, which is added as the content of the paragraph. If you save the `Document` to a file using `doc.save('output_document.docx')`, the text will be visible in the resulting Word document.

Remember that you should have already created the `Document` object before using the `add_paragraph()` method. If you haven't done that, you can create a new `Document` object using `doc = Document()` before adding paragraphs or other content.

In Word documents, the heading levels are typically represented by integer values ranging from 1 to 9. These integer values are used to define the hierarchical structure of headings in the document. Heading levels help to organize the content and create a table of contents with clickable links in the document.

The integer values and their corresponding heading levels are as follows:

- Heading Level 1: Represented by integer 1
- Heading Level 2: Represented by integer 2
- Heading Level 3: Represented by integer 3
- Heading Level 4: Represented by integer 4
- Heading Level 5: Represented by integer 5
- Heading Level 6: Represented by integer 6
- Heading Level 7: Represented by integer 7
- Heading Level 8: Represented by integer 8
- Heading Level 9: Represented by integer 9

For example, when you apply Heading 1 style to a paragraph in a Word document, it will be associated with heading level 1. Similarly, when you apply Heading 2 style to another paragraph, it will be associated with heading level 2, and so on.

Keep in mind that the specific styles and numbering of heading levels might vary depending on the Word version and the template being used. However, the concept of using integers to represent heading levels and create a hierarchical structure remains consistent across most versions of Microsoft Word.