Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pick and implement an approach to JSON-LD support #2

Closed
warriordog opened this issue Apr 25, 2023 · 11 comments
Closed

Pick and implement an approach to JSON-LD support #2

warriordog opened this issue Apr 25, 2023 · 11 comments
Assignees
Labels
area:code Affects or applies to the library code help wanted Extra attention is needed rejected:duplicate This issue or pull request already exists type:feature New feature or request

Comments

@warriordog
Copy link
Owner

warriordog commented Apr 25, 2023

Full JSON-LD is probably not practical to support, but these features are critical for ActivityPub support:

  • Set context appropriately (serialization)
  • Read & verify context (de-serialization)
  • Allow multiple contexts - this is necessary to communicate with most real-world ActivityPub software (both)

Additionally, these would be nice to have:

  • Map fully-qualified property names (de-serialization)
  • Remap properties based on context (de-serialization)
  • Not limited to fixed list of known contexts (both)
  • Use System.Text.JSON instead of a third-party library (both)
  • Support polymorphic properties (both)
@warriordog warriordog added the type:feature New feature or request label Apr 25, 2023
@warriordog
Copy link
Owner Author

For reference - there are branches with various partially-complete attempts to approach this from different angles. The main branch has another incomplete implementation.

@warriordog
Copy link
Owner Author

There are more details in the roadmap files of each branch.

@warriordog warriordog added the area:code Affects or applies to the library code label Apr 25, 2023
@Nbjohnston86
Copy link

I have been looking into json-ld, the libraries that exist, as well as the goals for this project. On the json-ld page(https://json-ld.org/) I was using to understand it better, it really only had two libraries listed that were available for use.

json-ld(dot)net seems to be mainly for turning one kind of json-ld document into another kind, and does not serialize/deserialize from json to c# classes and vice versa.

dotnetrdf is very similar to json-ld(dot)net, but does more than just that. But it still does not serialize and de-serialize from c# classes to json and vice versa.

Given the desire to minimize external dependencies, I think the best bet is to use System.Text.Json.

JsonNode and/or JsonDocument are other good options to consider later, if System.Text.Json is insufficient for the project's needs.

What have been the biggest hurdles to choosing an approach? I could be missing some key information.

@warriordog
Copy link
Owner Author

warriordog commented Jun 21, 2023

@Nbjohnston86 Sorry for the very late response! I read your comment weeks ago but couldn't quite work out how to explain the problem. I'll try now:

The main challenge I've encountered is that many real-world uses of ActivityPub are basically "add Y additional context to X basic object type(s)". For example, mastodon-compatible microblogging software usually implements quote posts by adding an additional context to the Note or Announce type. That context adds a few optional fields that describe the quote. This is hard to implement universally because some other extensions, like HTTP Signatures IIRC, apply to all object types including the base types. So you can't just say class SignedObject extends Object because Note doesn't derive from SignedObject. You can't implement those features separately because you need a type that derives from Note, QuoteToot, and SignedObject simultaneously. Interfaces would appear to be the ideal solution, but you can't deserialize to an interface so you either need code generation (eww) or the library user must provide their own implementation (double eww).

Even simple uses can end up with a situation where one JSON-LD object needs to deserialize into a type like Note & SignedObject & QuoteToot. In dynamic languages like TypeScript, this is easy to solve using interfaces. Everything deserializes to an object hash and typed with a union. But I'm not sure how to handle this cleanly in C#.

@warriordog
Copy link
Owner Author

warriordog commented Jun 21, 2023

One idea, which is really not that great tbh, is to deserialize to a synthetic type that maps contexts to the appropriate subclass in a one-to-one fashion. This would be the responsibility of library users, but its not too much effort. Something like this:

using ActivityPubSharp.Types;
using Some.ThirdParty.Lib;
namespace Some.ActivityPub.Application;

/// <summary>
/// Note object supported by this application
/// </summary>
public class SupportedNote {
  // This is provided by ActivityPubSharp and has the typical ActivityStreams Note properties.
  public ASNote Note { get; set; }

  // This is unique to Some.ActivityPub.Application and provides fields for a custom signature.
  public SignedObject Signature { get; set; } 

  // This is provided by Some.ThirdParty.Lib and provided extra properties for quote posts.
  // Null if the context was not included in the source object.
  public QuotePost? Quote { get; set; } 
}

The deserializer could read an attribute such as [JsonLdContext("uri")] from each nested object, and that would identify which properties are meant to populate that object. For serialization, the same thing could work in reverse.

@jenniferplusplus
Copy link
Collaborator

jenniferplusplus commented Jul 6, 2023

@warriordog I've also been having a lot of trouble with this. I'm probably going to have to take a crack at it myself, unless a miracle happens in the next couple of weeks or so and you or someone else figures out a good deserialization strategy for this bonkers format.

In case it helps your thinking, my plan is to run the messages through json-ld.net and use that to frame them into a consistent shape that system.text.json can deserialize into a typed object. If that works, I would want to share it as a package, with a mechanism to take other programmer-provided framing documents, in order to support extensions from additional @context properties.

@warriordog
Copy link
Owner Author

@jenniferplusplus that's the best approach I've heard so far, but it seems kind of inefficient and wasteful. I'm not sure there's any way around it, though.

I did have one other idea, which works kind of like this:

  • Extend the base AS type (in this repo, that's ASType) to include three new properties:
    • public JsonElement OriginalJson { get; } - contains a copy of the JSON that was parsed into this object. It wont't be updated when properties change because that's just too much lol
    • public JsonSerializerOptions OriginalJsonOptions { get; } - caches the options used to parse this object. I hate putting this here, its awful and terrible, but unfortunately necessary for this to work.
    • public Dictionary<string, IExtension> LoadedExtensions { get; } - starts empty, but can be populated with cached extension data. I'll explain how that works in a second.
  • Create an empty interface IExtension which is the base for all extensions. User-made extensions should implement this in a class with their extension properties.
  • Create another interface IExtension<TObj> which extends from IExtension.
    • Add this property: public TObj Object { get; } which points back to the original, non-extension object.
  • Use a custom JsonConverter to append each extension after the original object. Not trivial, but definitely doable. Especially with the work in Temporary rewrite of entire json implementation #21.

Now for the part that makes this somewhat kinda workable:

  • Each extension should define its context URI as a constant somewhere
  • And each extension should define these extension methods (pun not intended) on the type of object they extend:
    • public static SomeExtension? AsSomeExtension(this ASObject obj)
    • public static bool IsSomeExtension(this ASObject obj, out SomeExtension? ext)
  • Extension methods have two simple tasks:
    1. Check for and return the correct extension from the object. That can be done with obj.LoadedExtensions.TryGetValue(SomeExtensionId, out var ext)
    2. If that fails, then parse the extension from OriginalJson using OriginalJsonOptions. Once done, cache the result in LoadedExtensions and return it.

This might seem clunky, but the DX might be kind of OK compared to the other options. You could use it like this:

#region This can be in a NuGet package or wherever
public class Quote : IExtension<ASObject>
{
    public const string QuoteExtensionUri = "https://example.com/some.uri";
   
    /// <summary>
    /// This object, which quotes the target of <see cref="QuoteId">.
    /// </summary>
    [JsonIgnore]
    public ASObject Object { get; internal set; }

    /// <summary>
    /// URI / ID of the object that this object quotes
    /// </summary>
    public required ASLink QuoteId { get; set; }
}
public static class QuoteExtension
{
    public static Quote? AsQuote(this ASObject obj)
    {
        IsQuote(obj, out var quote);
        return quote;
    }

    public static bool IsQuote(this ASObject obj, [NotNullWhen(true)] out Quote? quote)
    {
        // Happy path - try to get from cache
        if (obj.LoadedExtensions.TryGetValue(Quote.QuoteExtensionUri, out quote))
            return true;

        // Not there; we need to create it
        quote = obj.OriginalJson.Deserialize<Quote>(obj.OriginalJsonOptions);
        if (quote == null)
            return false; // Deserialization failed

        // Set object - can't be done automatically.
        // ugly, but oh well
        quote.Object = obj;

        // Cache it for performance
        obj.LoadedExtensions[Quote.QuoteExtensionUri] = quote;

        return true;
    }
}
#endregion

// This would be in application code somewhere
ASObject somePost = GetPostFromWherever();
DoSomethingWithPost(somePost);
DoAnotherThing(somePost);

// The fun part
if (somePost.IsQuote(out var quote))
{
    DoSomethingWithAQuote(quote);
    GetTheOriginalPost(quote.QuoteId);
}

MoreStuffWithObject(somePost);

There is a significant downside, which is that unknown extensions wont be preserved round-trip. I have no idea if that's actually important but its something to be aware of. Another downside is the broken encapsulation (exposing JsonSerializationOptions to the entire application? yikes!). But overall, I think its not too bad considering how awful this data format is.

@jenniferplusplus
Copy link
Collaborator

I made a start at my framing plan and then gave it up because it's impossible to do in a way that's performant enough for this use case. And also writing those documents is harder than writing custom converters, somehow. So instead I'm trying a somewhat polymorphic deserialization strategy using custom converters. I don't yet have a good solution for extensions though. And I'm unexpectedly tripping over the @context object at the moment. But it seems promising.

Here, if you'd like to have a look.
https://github.com/Letterbook/Letterbook.ActivityPub

@jenniferplusplus
Copy link
Collaborator

oh, and a fact I recently learned that might make you feel better: JsonSerializationOptions becomes immutable after the first time it's used. So, exposing it to the rest of the application is less terrible than you might be thinking.

@warriordog
Copy link
Owner Author

@jenniferplusplus

Your implementation looks pretty good so far! I'm also using a polymorphic converter, although I'm using reflection to figure out the AS type -> .NET type mappings at runtime. I really wish I'd learned about Utf8JsonReader.Skip - I re-implemented that function from scratch 🤦‍♀️

@context has also been tripping me up, particularly the fact that it can be any of:

  • A string
  • An object (with a really complex schema)
  • An array of strings or objects
    Currently, I just support the string and single-element array forms. Eventually I'll circle back and support the other forms, but I still don't know how to handle the fact that property names and values can be shortened using the context. I think I might have to somehow pass the context down through each nested object.

Thanks for that tip about JsonSerializerOptions! That does make it a bit less bad.

@warriordog
Copy link
Owner Author

The current JSON implementation is reasonably workable, and its likely the best we can do until System.Text.JSON is overhauled to expose internal serializers (see #21). That leaves two remaining aspects that are now covered elsewhere:

That means we can finally close this issue! 🥳

@warriordog warriordog closed this as not planned Won't fix, can't repro, duplicate, stale Jul 20, 2023
@warriordog warriordog added the rejected:duplicate This issue or pull request already exists label Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:code Affects or applies to the library code help wanted Extra attention is needed rejected:duplicate This issue or pull request already exists type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants