Skip to content

viraltux/SQLdf.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQLdf

This package allows SQL queries on any Julia type implementing the Table.jl interface an in particular the DataFrame type from DataFrames.jl. All results from a query are returned in a DataFrame type.

Versions prior to 0.2.0 would use R/sqldf via RCall. Although this is still possible by executing setRDB("R") before using sqldf, the default SQL engine is SQLite from SQLite.jl. If the default RDB is changed to "R" it can be set to its default with setRDB("SQLite").

The use of setRDB("R") requires R to be installed with sqldf within.

Introduction

Simply treat DataFrame objects (or any type implementing Tables.jl) like SQL tables.

Examples

using SQLdf

Simple Queries on DataFrames

DF = DataFrame(a=1:14,  b=14:-1:1, c = split("Julia is great",""))

@sqldf "select count(*) from DF"
1×1 DataFrame
 Row │ count(*) 
     │ Int64    
─────┼──────────
   114

    
sqldf("""
      SELECT * 
      FROM DF
      WHERE a <= 5
      ORDER BY  a
      """)
5×3 DataFrame
 Row │ a      b      c      
     │ Int64  Int64  String 
─────┼──────────────────────
   11     14  J
   22     13  u
   33     12  l
   44     11  i
   55     10  a

Join DataFrames Query

T = DataFrame((a=1:14, c=split("Julia is fast!","")))

sqldf("""
      select * 
      from DF join T on DF.b = T.a
      order by DF.a
      """)
14×5 DataFrame
 Row │ a      b      c       a:1    c:1    
     │ Int64  Int64  String  Int64  String 
─────┼─────────────────────────────────────
   11     14  J          14  !
   22     13  u          13  t
   33     12  l          12  s
   44     11  i          11  a
   55     10  a          10  f
   66      9              9
   77      8  i           8  s
   88      7  s           7  i
   99      6              6
  1010      5  g           5  a
  1111      4  r           4  i
  1212      3  e           3  l
  1313      2  a           2  u
  1414      1  t           1  J

Join Query Types implementing Tables interface

using TimeSeries

dates = Date(2021, 1, 1):Day(1):Date(2021, 1, 14)
TA = TimeArray(dates, 1:14)

@sqldf "select * from TA join DF where TA.A = DF.a"
14×5 DataFrame
 Row │ timestamp   A      a:1    b      c      
     │ Date        Int64  Int64  Int64  String 
─────┼─────────────────────────────────────────
   12018-01-01      1      1     14  J
   22018-01-02      2      2     13  u
   32018-01-03      3      3     12  l
   42018-01-04      4      4     11  i
   52018-01-05      5      5     10  a
   62018-01-06      6      6      9
   72018-01-07      7      7      8  i
   82018-01-08      8      8      7  s
   92018-01-09      9      9      6
  102018-01-10     10     10      5  g
  112018-01-11     11     11      4  r
  122018-01-12     12     12      3  e
  132018-01-13     13     13      2  a
  142018-01-14     14     14      1  t