<a href="https://colab.research.google.com/github/victor-roris/mediumseries/blob/master/NLP/Blackstone_Spacy_model_for_legal_texts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Blackstone

Blackstone is a spaCy model and library for processing long-form, unstructured legal text. 

Web: https://research.iclr.co.uk/blackstone -
GitHub: https://github.com/ICLRandD/Blackstone

* *Named-Entity Recogniser (NER)*

 - CASENAME: Case names
 - CITATION: Citations (unique identifiers for reported and unreported cases)
 - INSTRUMENT: Written legal instruments
 - PROVISION: Unit within a written legal instrument
 - COURT: Court or tribunal
 - JUDGE: References to judges
 
 
 * *Text Categoriser*: the text categoriser classifies longer spans of text, such as sentences.
 
  - AXIOM: The text appears to postulate a well-established principle
  - CONCLUSION: The text appears to make a finding, holding, determination or conclusion
  - ISSUE: The text appears to discuss an issue or question
  - LEGAL_TEST: The test appears to discuss a legal test
  - UNCAT: The text does not fall into one of the four categories above

## Installation

Install the library

In [0]:
! pip install 'spacy==2.1.4'

In [1]:
! pip install blackstone

Collecting blackstone
[?25l  Downloading https://files.pythonhosted.org/packages/ee/c7/302e2b1ccb8414b9d360fe5d8ea8257f781c6ddc0763e128a302bf6f629a/blackstone-0.1.13-py3-none-any.whl (108kB)
[K     |███                             | 10kB 15.8MB/s eta 0:00:01[K     |██████                          | 20kB 1.8MB/s eta 0:00:01[K     |█████████                       | 30kB 2.6MB/s eta 0:00:01[K     |████████████                    | 40kB 1.7MB/s eta 0:00:01[K     |███████████████▏                | 51kB 2.1MB/s eta 0:00:01[K     |██████████████████▏             | 61kB 2.5MB/s eta 0:00:01[K     |█████████████████████▏          | 71kB 2.9MB/s eta 0:00:01[K     |████████████████████████▏       | 81kB 3.3MB/s eta 0:00:01[K     |███████████████████████████▎    | 92kB 3.7MB/s eta 0:00:01[K     |██████████████████████████████▎ | 102kB 2.8MB/s eta 0:00:01[K     |████████████████████████████████| 112kB 2.8MB/s 
Collecting conllu
  Downloading https://files.pythonhosted.org/pack

Install the Blackstone model

In [2]:
!pip install https://blackstone-model.s3-eu-west-1.amazonaws.com/en_blackstone_proto-0.0.1.tar.gz

Collecting https://blackstone-model.s3-eu-west-1.amazonaws.com/en_blackstone_proto-0.0.1.tar.gz
[?25l  Downloading https://blackstone-model.s3-eu-west-1.amazonaws.com/en_blackstone_proto-0.0.1.tar.gz (243.3MB)
[K     |████████████████████████████████| 243.3MB 60kB/s 
Building wheels for collected packages: en-blackstone-proto
  Building wheel for en-blackstone-proto (setup.py) ... [?25l[?25hdone
  Created wheel for en-blackstone-proto: filename=en_blackstone_proto-0.0.1-cp36-none-any.whl size=243759405 sha256=4cdc2b65bd354a739fc195625d29128c6401d34b6c8aa7295e5a4dbf4d00d5dc
  Stored in directory: /root/.cache/pip/wheels/a2/81/dd/09c3b4ef7899e7d9cf92ed3152d29a08b5fd80f7a8bf66df4d
Successfully built en-blackstone-proto
Installing collected packages: en-blackstone-proto
Successfully installed en-blackstone-proto-0.0.1


Import Blackstone model

In [0]:
import spacy
nlp = spacy.load("en_blackstone_proto")

In [2]:
assert spacy.__version__ == '2.1.4', 'The blackstone can fail by the spacy version'

AssertionError: ignored

In [0]:
nlp.pipe_names

['sentencizer', 'tagger', 'parser', 'ner', 'textcat']

## Applying the NER model

In [3]:
text = """ 31 As we shall explain in more detail in examining the submission of the Secretary of State (see paras 77 and following), it is the Secretary of State’s case that nothing has been done by Parliament in the European Communities Act 1972 or any other statute to remove the prerogative power of the Crown, in the conduct of the international relations of the UK, to take steps to remove the UK from the EU by giving notice under article 50EU for the UK to withdraw from the EU Treaty and other relevant EU Treaties. The Secretary of State relies in particular on Attorney General v De Keyser’s Royal Hotel Ltd [1920] AC 508 and R v Secretary of State for Foreign and Commonwealth Affairs, Ex p Rees-Mogg [1994] QB 552; he contends that the Crown’s prerogative power to cause the UK to withdraw from the EU by giving notice under article 50EU could only have been removed by primary legislation using express words to that effect, alternatively by legislation which has that effect by necessary implication. The Secretary of State contends that neither the ECA 1972 nor any of the other Acts of Parliament referred to have abrogated this aspect of the Crown’s prerogative, either by express words or by necessary implication."""

# Apply the model to the text
doc = nlp(text)

# Iterate through the entities identified by the model
for ent in doc.ents:
    print(ent.text, ent.label_)

European Communities Act 1972 INSTRUMENT
article 50EU PROVISION
EU Treaty INSTRUMENT
Attorney General v De Keyser’s Royal Hotel Ltd CASENAME
[1920] AC 508 CITATION
R v Secretary of State for Foreign and Commonwealth Affairs, Ex p Rees-Mogg CASENAME
[1994] QB 552 CITATION
article 50EU PROVISION


In [4]:
"""
Visualise entities using spaCy's displacy visualiser. 

Blackstone has a custom colour palette: `from blackstone.displacy_palette import ner_displacy options`
"""

import spacy
from spacy import displacy
from blackstone.displacy_palette import ner_displacy_options

nlp = spacy.load("en_blackstone_proto")

text = """
The applicant must satisfy a high standard. This is a case where the action is to be tried by a judge with a jury. The standard is set out in Jameel v Wall Street Journal Europe Sprl [2004] EMLR 89, para 14:
“But every time a meaning is shut out (including any holding that the words complained of either are, or are not, capable of bearing a defamatory meaning) it must be remembered that the judge is taking it upon himself to rule in effect that any jury would be perverse to take a different view on the question. It is a high threshold of exclusion. Ever since Fox’s Act 1792 (32 Geo 3, c 60) the meaning of words in civil as well as criminal libel proceedings has been constitutionally a matter for the jury. The judge’s function is no more and no less than to pre-empt perversity. That being clearly the position with regard to whether or not words are capable of being understood as defamatory or, as the case may be, non-defamatory, I see no basis on which it could sensibly be otherwise with regard to differing levels of defamatory meaning. Often the question whether words are defamatory at all and, if so, what level of defamatory meaning they bear will overlap.”
18 In Berezovsky v Forbes Inc [2001] EMLR 1030, para 16 Sedley LJ had stated the test this way:
“The real question in the present case is how the courts ought to go about ascertaining the range of legitimate meanings. Eady J regarded it as a matter of impression. That is all right, it seems to us, provided that the impression is not of what the words mean but of what a jury could sensibly think they meant. Such an exercise is an exercise in generosity, not in parsimony.”
"""

doc = nlp(text)

# Call displacy and pass `ner_displacy_options` into the option parameter`
displacy.serve(doc, style="ent", options=ner_displacy_options)


Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


## Applying the text categoriser model

In [5]:
def get_top_cat(doc):
    """
    Function to identify the highest scoring category
    prediction generated by the text categoriser. 
    """
    cats = doc.cats
    max_score = max(cats.values()) 
    max_cats = [k for k, v in cats.items() if v == max_score]
    max_cat = max_cats[0]
    return (max_cat, max_score)

text = """
It is a well-established principle of law that the transactions of independent states between each other are governed by other laws than those which municipal courts administer. \
It is, however, in my judgment, insufficient to react to the danger of over-formalisation and “judicialisation” simply by emphasising flexibility and context-sensitivity. \
The question is whether on the facts found by the judge, the (or a) proximate cause of the loss of the rig was “inherent vice or nature of the subject matter insured” within the meaning of clause 4.4 of the Institute Cargo Clauses (A).
"""

# Apply the model to the text
doc = nlp(text)

# Get the sentences in the passage of text
sentences = [sent.text for sent in doc.sents]

# Print the sentence and the corresponding predicted category.
for sentence in sentences:
    doc = nlp(sentence)
    top_category = get_top_cat(doc)
    print (f"\"{sentence}\" {top_category}\n")

"
It is a well-established principle of law that the transactions of independent states between each other are governed by other laws than those which municipal courts administer." ('AXIOM', 0.9461941123008728)

"It is, however, in my judgment, insufficient to react to the danger of over-formalisation and “judicialisation” simply by emphasising flexibility and context-sensitivity." ('CONCLUSION', 0.9293838143348694)

"The question is whether on the facts found by the judge, the (or a) proximate cause of the loss of the rig was “inherent vice or nature of the subject matter insured” within the meaning of clause 4.4 of the Institute Cargo Clauses (A)." ('ISSUE', 0.5091703534126282)

"
" ('UNCAT', 1.0)



## Custom pipeline extensions

### Abbreviation detection and long-form definition resolution

It is not uncommon for authors of legal documents to abbreviate long-winded terms that will be used instead of the long-form througout the rest of the document. For example,

  > The European Court of Human Rights ("ECtHR") is the court ultimately responsible for applying the European Convention on Human Rights ("ECHR").

This is heavily based on the AbbreviationDetector() component in [scispacy]

In [6]:
from blackstone.pipeline.abbreviations import AbbreviationDetector

# Add the abbreviation pipe to the spacy pipeline.
abbreviation_pipe = AbbreviationDetector(nlp)
nlp.add_pipe(abbreviation_pipe)

doc = nlp('The European Court of Human Rights ("ECtHR") is the court ultimately responsible for applying the European Convention on Human Rights ("ECHR").')

print("Abbreviation", "\t", "Definition")
for abrv in doc._.abbreviations:
	print(f"{abrv} \t ({abrv.start}, {abrv.end}) {abrv._.long_form}")

Abbreviation 	 Definition
"ECtHR" 	 (7, 10) European Court of Human Rights
"ECHR" 	 (25, 28) European Convention on Human Rights


## Application to the NDA agreements

In [0]:
nda_headers = [
    """THIS AGREEMENT (the "Agreement”) is entered into on this 12th day of June 2019 by and between MindData SA, located at 235 Rua Priego, San Fernando, Madrid ( the” Disclosing Party”), and DataSpartan Ltd. with an address at 788 98 Meeting Point, Moorgate, London (the “Receiving Party”).""",
    """This non disclosure agreement is entered into on 8th day of December 2021 between Omar Akhur Mohamed (Owner) and Veronica Escobar Montoya (Recipient).""",
    """THIS AGREEMENT is made on 2019/08/08. between 1 INTEGRATED HEALTH INFORMATION SYSTEMS PTE LTD (ACRA No. 200814464H), a company incorporated in Singapore and having its office at 6 Serangoon North Ave 5 #01-01/02 Singapore 554910 (hereinafter referred to as “IHIS”); and 2 REBORN TEAM CORP (ACRA No. 576190239U), a company incorporated in United Kingdom and having its office at 51 Jupyter Bridge, Golden Town, Cambridge (hereinafter referred to as “the Company”).""",
    """Non Disclosure Agreement Higrid T. Harrison having an address of 98B Walmart Road, McDon, RedBrooks, Kunilan (hereinafter referred to as "Recipient"), and  Bussiness Risk Invest. LTD, having an address of YL 32 Orange Serv Build, Silonia, Parrow, (hereinafter referred to as "Owner"), hereby agree that:""",
    """This Non-Disclosure Agreement (the "Agreement"), effective as of the date last entered below (the "Effective Date"), is entered into by and between Ramen Skashi Int. Ltd (the "Disclosing Party") and the Recipient named below (the "Recipient", and together with the Disclosing Party, the "Parties", and each, a "Party").""",
    """THIS CONFIDENTIALITY AND NON-DISCLOSURE AGREEMENT (this “Agreement”) is made and entered into as of the 20/11/2017 of Agreement set forth above by and between [Naamloze Vennootschap N.V.] and [Dotdash publishing]."""
]

* **NER**

In [8]:
for index, nda_header in enumerate(nda_headers):
    
    print(f"-- Text : {index}")
    
    # Apply the model to the text
    doc = nlp(nda_header)

    # Iterate through the entities identified by the model
    for ent in doc.ents:
        print(f"\t {ent.text} -> {ent.label_}")

-- Text : 0
-- Text : 1
	 Omar Akhur Mohamed (Owner) and Veronica Escobar Montoya (Recipient) -> CASENAME
-- Text : 2
	 INTEGRATED HEALTH INFORMATION SYSTEMS -> JUDGE
	 5 #01 -> CASENAME
-- Text : 3
	 YL 32 -> PROVISION
-- Text : 4
	 "Effective -> COURT
	 Ramen Skashi Int. -> CASENAME
-- Text : 5
	 THIS CONFIDENTIALITY -> JUDGE


* **Text classifier**

In [9]:
def get_top_cat(doc):
    """
    Function to identify the highest scoring category
    prediction generated by the text categoriser. 
    """
    cats = doc.cats
    max_score = max(cats.values()) 
    max_cats = [k for k, v in cats.items() if v == max_score]
    max_cat = max_cats[0]
    return (max_cat, max_score)

for index, nda_header in enumerate(nda_headers):
    
    print(f"-- Text : {index}")
    
    # Apply the model to the text
    doc = nlp(nda_header)
    
    # Get the sentences in the passage of text
    sentences = [sent.text for sent in doc.sents]

    # Print the sentence and the corresponding predicted category.
    for sentence in sentences:
        doc = nlp(sentence)
        top_category = get_top_cat(doc)
        print(f"\t \"{sentence}\" {top_category}")

-- Text : 0
	 "THIS AGREEMENT (the "Agreement”) is entered into on this 12th day of June 2019 by and between MindData SA, located at 235 Rua Priego, San Fernando, Madrid ( the” Disclosing Party”), and DataSpartan Ltd. with an address at 788 98 Meeting Point, Moorgate, London (the “Receiving Party”)." ('UNCAT', 0.9713473320007324)
-- Text : 1
	 "This non disclosure agreement is entered into on 8th day of December 2021 between Omar Akhur Mohamed (Owner) and Veronica Escobar Montoya (Recipient)." ('UNCAT', 0.9999927282333374)
-- Text : 2
	 "THIS AGREEMENT is made on 2019/08/08." ('UNCAT', 0.9999996423721313)
	 "between 1 INTEGRATED HEALTH INFORMATION SYSTEMS PTE LTD (ACRA No." ('ISSUE', 0.9057997465133667)
	 "200814464H), a company incorporated in Singapore and having its office at 6 Serangoon North Ave 5 #01-01/02 Singapore 554910 (hereinafter referred to as “IHIS”); and 2 REBORN TEAM CORP (ACRA No." ('UNCAT', 0.996208906173706)
	 "576190239U), a company incorporated in United Kingdom an

* **Abbreviation detection and long-form definition resolution**

In [10]:
from blackstone.pipeline.abbreviations import AbbreviationDetector

# Add the abbreviation pipe to the spacy pipeline.
abbreviation_pipe = AbbreviationDetector(nlp)
if not nlp.has_pipe('AbbreviationDetector'):
    nlp.add_pipe(abbreviation_pipe)

for index, nda_header in enumerate(nda_headers):
    
    print(f"-- Text : {index}")
    
    # Apply the model to the text
    doc = nlp(nda_header)
    
    print("Abbreviation", "\t", "Definition")
    for abrv in doc._.abbreviations:
        print(f"\t {abrv} \t ({abrv.start}, {abrv.end}) {abrv._.long_form}")

-- Text : 0
Abbreviation 	 Definition
-- Text : 1
Abbreviation 	 Definition
-- Text : 2
Abbreviation 	 Definition
-- Text : 3
Abbreviation 	 Definition
-- Text : 4
Abbreviation 	 Definition
-- Text : 5
Abbreviation 	 Definition


## Conclusion

The blackstone library was trainned to identify texts and entities of the court legal world. It doesn't seem appropriated for agreements analysis. In this case, this isn't interesting for the NDA project.