Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
library for working with tabular data in Julia
tree: 71d3a6b26b

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
demo
spec
src
test
README.md
TODO.md

README.md

JuliaData

Library for working with tabular data in Julia using DataFrame's.

Features

  • DataFrame for efficient tabular storage of two-dimensional data
  • Minimized data copying
  • Default columns can handle missing values (NA's) of any type
  • PooledDataFrame for efficient storage of factor-like arrays for characters, integers, and other types
  • Flexible indexing
  • SubDataFrame for efficient subset referencing without copies
  • Grouping operations inspired by plyr, pandas, and data.table
  • Basic merge functionality
  • stack and unstack for long/wide conversions
  • Pipelining support (|) for many operations
  • Several typical R-style functions, including head, tail, summary, unique, duplicated, with, within, and more
  • Formula and design matrix implementation

Demos

Here's a minimal demo showing some grouping operations:

julia> load("src/init.jl")     # run from JuliaData directory. will be replaced when packages arrive.

julia> d = DataFrame(quote     # expressions are one way to create a DataFrame
           x = randn(10)
           y = randn(10)
           i = randi(3,10)
           j = randi(3,10)
       end);

julia> dump(d)    # dump() is like R's str()
DataFrame  10 observations of 4 variables
  x: DataVec{Float64}(10) [-0.22496343871037897,-0.4033933555989207,0.6027847717547058,0.06671669747901597]
  y: DataVec{Float64}(10) [0.21904975091285417,-1.3275512477731726,2.266353546459277,-0.19840910239041679]
  i: DataVec{Int64}(10) [2,1,3,1]
  j: DataVec{Int64}(10) [3,2,1,2]

julia> head(d)
DataFrame  (6,4)
                x         y i j
[1,]    -0.224963   0.21905 2 3
[2,]    -0.403393  -1.32755 1 2
[3,]     0.602785   2.26635 3 1
[4,]    0.0667167 -0.198409 1 2
[5,]      1.68303  -1.11183 1 3
[6,]     0.346034   1.68227 2 1

julia> d[1:3, ["x","y"]]     # indexing is similar to R's
DataFrame  (3,2)
                x        y
[1,]    -0.224963  0.21905
[2,]    -0.403393 -1.32755
[3,]     0.602785  2.26635

julia> # Group on column i, and pipe (|) that result to an expression
julia> # that creates the column x_sum. 
julia> groupby(d, "i") | :(x_sum = sum(x))     
DataFrame  (3,2)
        i    x_sum
[1,]    1  2.06822
[2,]    2 -1.80867
[3,]    3 0.319517

julia> groupby(d, "i") | :sum   # Another way to operate on a grouping
DataFrame  (3,4)
        i    x_sum    y_sum j_sum
[1,]    1  2.06822 -2.73985     8
[2,]    2 -1.80867  1.83489     7
[3,]    3 0.319517  1.03072     2

See demo/workflow_demo.jl for a basic demo of the parts of a Julian data workflow.

See demo/design_demo.jl for a more in-depth demo of DataFrame and related types and library.

Documentation

Development work

The Issues highlight a number of issues and ideas for enhancements. Here are some particular enhancements under way or under discussion:

Possible changes to Julia

DataFrames fit well with Julia's syntax, but some features would improve the user experience, including keyword function arguments (Julia issue 485), "~" for easier expression syntax, and overloading "." for easier column access (df.colA). See here for a bit more information.

Current status

Please consider this a development preview. Many things work, but expect some rough edges. We hope that this can become a standard Julia package.

Something went wrong with that request. Please try again.