<a href="https://colab.research.google.com/github/tubagokhan/RegNLPDataset/blob/main/TitleCheck.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [39]:
import json

def is_title(text):
    """
    Determines if a text is a title based on the following criteria:
    - If the text finishes with a period ('.'), it is not a title.
    - If all characters in the text are uppercase, it's considered a title.
    - If the text has fewer than 30 characters and does not end with a period, it's considered a title.
    - If the text follows title capitalization rules, considering specific common lowercased words, it's considered a title. However, due to potential human errors, if up to two words start with lowercase, it is still considered a title.
    - Except these criteria, the function returns None.
    """
    common_lower_words = {'and', 'or', 'but', 'nor', 'so', 'for', 'yet', 'after', 'before', 'to', 'of', 'with', 'without', 'within', 'among', 'between', 'by'}

    if text.endswith('.'):
        return False
    elif text.isupper():
        return True
    elif len(text) < 30 and not text.endswith('.'):
        return True
    else:
        words = text.split()
        lowercase_count = 0  # Counter for words starting with lowercase due to human errors

        for i, word in enumerate(words):
            clean_word = word.strip("-–,.!?").lower()

            # Check if the word should be capitalized (not a common lower word, first word, or last word)
            if clean_word not in common_lower_words or i == 0 or i == len(words) - 1:
                if not word[0].isupper():
                    lowercase_count += 1
                    if lowercase_count > 2:  # More than 2 lowercase words means it's not a title
                        return False

            # Allow up to two lowercase words due to human errors
            else:
                if (i == 0 or words[i-1][-1] in ".!?-") and not word[0].isupper():
                    lowercase_count += 1
                    if lowercase_count > 2:  # More than 2 lowercase words means it's not a title
                        return False

        return True if lowercase_count <= 2 else False  # If all checks passed, it's a title

    # If none of the criteria are met, return None
    return None





def process_and_save_json_data(input_file_path, output_file_path):
    """
    Reads a JSON file from the given input file path, adds an 'isTitle' key to each item based on the 'Text' key,
    and writes the modified data to a new file at the specified output file path.
    """
    try:
        with open(input_file_path, 'r') as infile:
            data = json.load(infile)

        for item in data:
            text = item.get('Text', '')
            item['isTitle'] = is_title(text)

        with open(output_file_path, 'w') as outfile:
            json.dump(data, outfile, indent=4)

        print(f"Modified data has been saved to {output_file_path}")
    except Exception as e:
        print(f"An error occurred: {e}")

# Specify the input and output file paths
input_file_path = '/content/drive/Othercomputers/MBZUAI/MBZUAI/ADGM-Project/ADGM-Docs/StandardizedDocs/FSRA Rulebooks/COBS_VER15.150823.json'
output_file_path = '/content/drive/Othercomputers/MBZUAI/MBZUAI/ADGM-Project/ADGM-Docs/StandardizedDocs/FSRA Rulebooks/COBS_VER15_with_titles.json'

# Process the JSON data and save to a new file
process_and_save_json_data(input_file_path, output_file_path)


Modified data has been saved to /content/drive/Othercomputers/MBZUAI/MBZUAI/ADGM-Project/ADGM-Docs/StandardizedDocs/FSRA Rulebooks/COBS_VER15_with_titles.json


In [40]:
import json

def print_titles_from_json(file_path):
    """
    Reads a JSON file from the given file path and prints the 'Text' of each item where 'isTitle' is True.
    """
    try:
        with open(file_path, 'r') as json_file:
            data = json.load(json_file)


        for item in data:
            if item.get('isTitle') is False:
                print(item.get('Text'))
                print('--------------')

    except Exception as e:
        print(f"An error occurred: {e}")

# Specify the path to the latest JSON file
latest_json_file_path = '/content/drive/Othercomputers/MBZUAI/MBZUAI/ADGM-Project/ADGM-Docs/StandardizedDocs/FSRA Rulebooks/COBS_VER15_with_titles.json'

# Call the function to print all titles
print_titles_from_json(latest_json_file_path)

Application
This Rulebook applies to every Authorised Person with respect to the carrying on, in or from the Abu Dhabi Global Market, of any:
(a)	Regulated Activity where this involves provision of a service to a Client; or
(b)	activity which is carried on, or held out as being carried on, in connection with or for the purposes of such a Regulated Activity;
except to the extent that a provision provides for a narrower application.

--------------
This chapter applies to an Authorised Person carrying on or intending to carry on any Regulated Activity with or for a Person.

--------------
For the purposes of this chapter, a Person includes any organisation (including outside of the Abu Dhabi Global Market) whether or not it has a separate legal personality.

--------------
This chapter does not apply to:
(a)	a Credit Rating Agency in so far as it carries on, or intends to carry on, the Regulated Activity of Operating a Credit Rating Agency;
(b)	an Authorised ISPV;
(c)	an Authorised Perso