xlang.datalake

A datalake based on xlang and native file system to support namespace, blob, replication, version, SQL

Background

Open-Source project is programmer’s novel

In the past, had accumulated many experience on Microsoft’s document based database Comsosdb (Azure Cosmos DB), and its script language Scope (external side called U-SQL); and comparing with this Cosmos DB, had done some kind of technical research on MangoDB; this is one kind, and other kinds:

had long time technical experiences on relational database, and between years 2014-2016, was trying to integrate node-based database Neo4j Graph DB into my product, that time paid attention on NoSQL database, and bla bla bla…
AND during deep learning research and product development stage and practices on big data stuff, dataset is very important concept, what is dataset, a collection of data with any or free style hierarchical structure…
Even more, how do you think about python pickle serialization, PyTorch directly uses it to store its weights, so we need to also consider this into the design, interesting idea or strange?
When we do a website, for example, search-based documents website, like https://numpy.org , can we directly use existed files from native file system as its searchable document database?
If we want to do a website with image and even large size file like video files ( .mp4, .avi)? most of time, we use file system to directly store these files, not silly put into relational database as bob.
For database ( sql or no-sql), schema is very important, pre-defined schema or just meta data based schema?
when you have very very large amount of files stored in your local disk, how to do quick search? treat it like a data-lake with automatically indexing and also can be replicated into other computers like you are using cloud based data-lake.

Important Concepts and terms

SQL based query/update, with extended grammar
ACID MUST HAVE
Document Database with json, yaml, html, excel file, word file, pdf, image file, video files etc.
Document meta data --- retrieve from file system and file headers.
Document Parser—such as pdf parser, ms word parser etc.
Structured data
Table—still in relational database domain, and consider add colum-based table like Parquet
Join operator cross all kind of data
Container is just like a folder, folder path looks like namespace
But file also can play like a container, for example, sqlite file, it is a container

Implementation

using xlang as primary coding language
using c++ to write libs imported into xlang to intrgrate with some kind of parser( word, excel, pdf...)

Planning and welcome to join this project

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xlang.datalake

Background

Important Concepts and terms

Implementation

About

Releases

Packages

License

xlang-foundation/xlang.datalake

Folders and files

Latest commit

History

Repository files navigation

xlang.datalake

Background

Important Concepts and terms

Implementation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages