Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc/dps arrays #110

Closed
wants to merge 5 commits into from
Closed

Doc/dps arrays #110

wants to merge 5 commits into from

Conversation

Divesh-Otwani
Copy link
Contributor

I documented DPS arrays (and did some background reading on them).

(By the way, I don't see how this implementation reduces the number of allocations; it seems like it doesn't actually help -- though I'm probably missing something. The example I made is doable with exactly one allocation of a temporary array of size n in C. This is not the case here.)

@aspiwack
Copy link
Member

I don't know what your question actually is. But if what you are trying to do is compare mutable arrays and destinations, my suggestion is to see how the interaction of split and freeze (the latter being part of the allocation primitive in destinations) is different.

@aspiwack
Copy link
Member

aspiwack commented Aug 6, 2020

@utdemir I'll leave it to you to review this PR, too.

Copy link
Contributor

@utdemir utdemir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I am not the best person to review this, since I do not completely understand in which cases I would use this module, instead of Vector.fromList or Pull/Push arrays.

-- (i.e., [deforesting](https://www.sciencedirect.com/science/article/pii/030439759090147A)),
-- be allocated, filled, passed along and de-allocated. When the allocation
-- of these arrays is controlled by the programmer and not
-- done by Haskell's GC (garbage collector), programs are often more efficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably mistaken, but I don't see how just using this module makes the programs more efficient. If I'm wrong, it'd be nice to explain why here :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading Arnaud's explanation, I think we should reword this last sentence with something like:

Destination arrays, makes it possible to reduce the amount of allocations
without relying on compiler optimisations or fusion rules.

-- >
-- > inputVector :: IO (Vector Int)
-- > inputVector =
-- > return (fromList (map (\x -> (7 * (x+3)) `div` 11) [1..100]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of describing an IO action, I think this would be a shorter example if we were to just describe a Vector Int -> Vector Int function. eg.

computeDiff :: Vector Int -> DPS.DArray Int #-> ()
computeDiff = undefined

vectorDiff :: Vector Int -> Vector Int
vectorDiff vec =
  let diffSize = (Vector.length vec) - 1
  in  DPS.alloc diffSize (computeDiff vec)

--
-- Since 'DArray' doesn't have a 'Consumable' instance, the only way to
-- consume it is with the given API (e.g., with 'fill' or perhaps
-- 'fromFunction') which fills the destiniation array completely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here on destination.

-- Linear types are used to ensure that the destination array
-- is always written to. Why? Well:
--
-- Because of linear types, any function that uses a destination array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"because of linear types" sounds a bit hand-wavy. The explanation after this sentence is clear, so maybe we should just start with that ("The only way to create ...").

@aspiwack
Copy link
Member

@utdemir I believe, on the contrary, that it makes you the best person to review: the documentation is supposed to make you understand why you want to use this.

I'm available for specific questions though.

@utdemir
Copy link
Contributor

utdemir commented Aug 25, 2020

@aspiwack I think I am missing the main point, because I still don't understand how this can be more efficient than just calling something like Vector.fromListN inside the function.

The only thing I can think of is that creating a DArray once, splitting them and filling using two different functions would be faster than combining two vectors after the fact. But the module gives the impression that it is not the only point.

@aspiwack
Copy link
Member

aspiwack commented Sep 3, 2020

Sorry, sorry, I forgot to reply to this one. I promise that I'm catching up with my email and stuff.

“More efficient” is a bit vague. The general way that I think about destinations is that taking a destination instead of returning an array lets you ask a new question, namely: whose responsibility is it to allocate the array. When you return an array, you have no choice: it is you who will allocate the array. Where as when you take a destination as an argument, it can be someone else.

At this point, exercise: show that from a dps function f :: a -> DArray b #-> (), you can define an array-returning function f :: a -> Array b. So DPS is indeed the more general form.

There can be many reasons for doing such a thing: destinations can be split. So I can give one part to a thread, and another part to another thread. If I used array-returning functions, both of these thread would allocate an array, and I would have to copy them into a new array on my way out. Vector depends on array-fusion to avoid such extra-allocation, but it's really not reliable (if these are really threads (e.g. async), vector fusion will not trigger). Another case where fusion will most certainly not work, is when you want to fill in a memory-mapped buffers, since it is not a normal vector thing.

Further reading:

@Divesh-Otwani
Copy link
Contributor Author

Closing in favor of #238.

@Divesh-Otwani Divesh-Otwani deleted the doc/dps-arrays branch October 13, 2020 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants