-
Notifications
You must be signed in to change notification settings - Fork 0
/
rss.xml
136 lines (136 loc) · 17.9 KB
/
rss.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data mining</title><link>https://www.data-mining.co.nz/</link><description>Commercial data mining activity at the University of Waikato</description><atom:link href="https://www.data-mining.co.nz/rss.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2024 <a href="mailto:fracpete@waikato.ac.nz">University of Waikato</a> </copyright><lastBuildDate>Wed, 05 Jun 2024 04:32:23 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Faster Whisper 1.0.2 (speech-to-text)</title><link>https://www.data-mining.co.nz/news/2024-05-28-faster-whisper/</link><dc:creator>University of Waikato</dc:creator><description><p>New Docker images are now available for speech-to-text using <a class="reference external" href="https://github.com/SYSTRAN/faster-whisper">Faster Whisper</a> 1.0.2:</p>
<p><a class="reference external" href="https://github.com/waikato-llm/whisper/tree/main/faster-whisper-1.0.2_cuda12.1">https://github.com/waikato-llm/whisper/tree/main/faster-whisper-1.0.2_cuda12.1</a></p>
<p><a class="reference external" href="https://github.com/waikato-llm/whisper/tree/main/faster-whisper-1.0.2_cpu">https://github.com/waikato-llm/whisper/tree/main/faster-whisper-1.0.2_cpu</a></p>
<p>Faster Whisper is a reimplementation of OpenAI's Whisper library with some <a class="reference external" href="https://github.com/SYSTRAN/faster-whisper#benchmark">dramatic speed ups</a>.</p>
<p>With the release of these images, the Coqui STT images have been retired (just like the <a class="reference external" href="https://github.com/coqui-ai/STT/blob/main/README.rst">Coqui STT project itself</a>).</p></description><category>release</category><guid>https://www.data-mining.co.nz/news/2024-05-28-faster-whisper/</guid><pubDate>Mon, 27 May 2024 20:29:00 GMT</pubDate></item><item><title>image-dataset-converter release</title><link>https://www.data-mining.co.nz/news/2024-05-06-idc-release/</link><dc:creator>University of Waikato</dc:creator><description><p>Based on lessons learned from our <a class="reference external" href="https://github.com/waikato-ufdl/wai-annotations">wai-annotations</a> library,
we simplified and streamlined the design of a data processing library (though limited to just image datasets).
Of course, it makes use of the latest <a class="reference external" href="https://github.com/waikato-datamining/seppl">seppl</a> version, which also
simplified how plugins are being located at runtime and development time.</p>
<p>The new kid on the block is called <strong>image-dataset-converter</strong> and its code is located here:</p>
<p><a class="reference external" href="https://github.com/waikato-datamining/image-dataset-converter">https://github.com/waikato-datamining/image-dataset-converter</a></p>
<p>Whilst it is based on wai-annotations, it already contains additional functionality.</p>
<p>And, of course, we also have resources demonstrating how to use the new library:</p>
<p><a class="reference external" href="https://www.data-mining.co.nz/image-dataset-converter-examples/">https://www.data-mining.co.nz/image-dataset-converter-examples/</a></p></description><category>release</category><guid>https://www.data-mining.co.nz/news/2024-05-06-idc-release/</guid><pubDate>Mon, 06 May 2024 04:12:00 GMT</pubDate></item><item><title>llm-dataset-converter release</title><link>https://www.data-mining.co.nz/news/2024-05-06-ldc-release/</link><dc:creator>University of Waikato</dc:creator><description><p>Version 0.2.3 of our <em>llm-dataset-converter</em> library is now available.</p>
<p>Quite a number of changes have happened since the first release last year, like xtuner support,
so check out the full change log here:</p>
<p><a class="reference external" href="https://github.com/waikato-llm/llm-dataset-converter/blob/main/CHANGES.rst">https://github.com/waikato-llm/llm-dataset-converter/blob/main/CHANGES.rst</a></p></description><category>release</category><guid>https://www.data-mining.co.nz/news/2024-05-06-ldc-release/</guid><pubDate>Mon, 06 May 2024 01:36:00 GMT</pubDate></item><item><title>XTuner Docker images available</title><link>https://www.data-mining.co.nz/news/2024-04-22-xtuner-docker/</link><dc:creator>University of Waikato</dc:creator><description><p>Docker images for <a class="reference external" href="https://github.com/InternLM/xtuner">XTuner</a> 0.1.18 are now available:</p>
<ul class="simple">
<li><p>In-house registry:</p>
<ul>
<li><p><code class="docutils literal"><span class="pre">public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-xtuner:0.1.18_cuda11.7</span></code></p></li>
</ul>
</li>
<li><p>Docker hub:</p>
<ul>
<li><p><code class="docutils literal"><span class="pre">waikatodatamining/pytorch-xtuner:0.1.18_cuda11.7</span></code></p></li>
</ul>
</li>
</ul>
<p>XTuner 0.1.18 now supports the just released llama-3 models (e.g.,
<a class="reference external" href="https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct">Meta-Llama-3-8B-Instruct</a>).</p></description><category>release</category><guid>https://www.data-mining.co.nz/news/2024-04-22-xtuner-docker/</guid><pubDate>Sun, 21 Apr 2024 23:27:00 GMT</pubDate></item><item><title>MMPretrain 1.2.0 Docker images available</title><link>https://www.data-mining.co.nz/news/2024-03-14-mmpretrain-docker/</link><dc:creator>University of Waikato</dc:creator><description><p>First Docker images are available for the <a class="reference external" href="https://github.com/open-mmlab/mmpretrain">MMPretrain</a>
framework, using the 1.2.0 release of MMPretrain (code base as of 2024-01-05):</p>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/waikato-datamining/mmpretrain/tree/master/1.2.0_cuda11.1">CUDA 11.1</a></p></li>
<li><p><a class="reference external" href="https://github.com/waikato-datamining/mmpretrain/tree/master/1.2.0_cpu">CPU</a></p></li>
</ul>
<p><strong>NB:</strong> MMPretrain is the successor of MMClassification, which can be used for image classification.</p></description><category>release</category><guid>https://www.data-mining.co.nz/news/2024-03-14-mmpretrain-docker/</guid><pubDate>Wed, 13 Mar 2024 22:55:00 GMT</pubDate></item><item><title>XTuner Docker images available</title><link>https://www.data-mining.co.nz/news/2024-02-27-xtuner-docker/</link><dc:creator>University of Waikato</dc:creator><description><p><a class="reference external" href="https://github.com/InternLM/xtuner">XTuner</a> is an efficient, flexible and full-featured toolkit for fine-tuning
large models (InternLM, Llama, Baichuan, Qwen, ChatGLM) and released under the Apache 2.0 license. The advantage
of this framework is that it is not tied down to a specific LLM architecture, but supports multiple ones out of the box.
With the just released version <a class="reference external" href="https://github.com/waikato-llm/llm-dataset-converter/releases/tag/v0.2.0">v0.2.0</a>
of our <a class="reference external" href="https://github.com/waikato-llm/llm-dataset-converter">llm-dataset-converter</a> Python library,
you can read and write the XTuner JSON format (and apply the usual filtering, of course).</p>
<p>Here are the newly added image tags:</p>
<ul class="simple">
<li><p>In-house registry:</p>
<ul>
<li><p><code class="docutils literal"><span class="pre">public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-xtuner:2024-02-19_cuda11.7</span></code></p></li>
</ul>
</li>
<li><p>Docker hub:</p>
<ul>
<li><p><code class="docutils literal"><span class="pre">waikatodatamining/pytorch-xtuner:2024-02-19_cuda11.7</span></code></p></li>
</ul>
</li>
</ul>
<p>Of course, you can use these Docker images in conjunction with our <a class="reference external" href="https://www.data-mining.co.nz/news/2023-11-03-gifr-release/">gifr</a>
Python library for <a class="reference external" href="https://www.gradio.app/">gradio</a> interfaces as well (<cite>gifr-textgen</cite>). Just now we released
version 0.0.4 of the library, which is more flexible in regards to text generation: it can now support send and receive
the conversation history and also parse JSON responses.</p></description><category>release</category><guid>https://www.data-mining.co.nz/news/2024-02-27-xtuner-docker/</guid><pubDate>Tue, 27 Feb 2024 03:40:00 GMT</pubDate></item><item><title>Text classification support</title><link>https://www.data-mining.co.nz/news/2024-02-15-text-classification-support/</link><dc:creator>University of Waikato</dc:creator><description><p>Large language models (LLMs) for chatbots are all the rage at the moment, but there is plenty of scope of simpler
tasks like text classification. Requiring less resources and being a lot faster is nice as well.</p>
<p>We turned the <a class="reference external" href="https://huggingface.co/docs/transformers/v4.36.1/en/tasks/sequence_classification">HuggingFace example</a>
for sequence classification into a docker image to make it easy for building such classification models.</p>
<ul class="simple">
<li><p>In-house registry:</p>
<ul>
<li><p><code class="docutils literal"><span class="pre">public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-huggingface-transformers:4.36.0_cuda11.7_classification</span></code></p></li>
</ul>
</li>
<li><p>Docker hub:</p>
<ul>
<li><p><code class="docutils literal"><span class="pre">waikatodatamining/pytorch-huggingface-transformers:4.36.0_cuda11.7_classification</span></code></p></li>
</ul>
</li>
</ul>
<p>Our <a class="reference external" href="https://github.com/waikato-datamining/gifr">gifr</a>
Python library for <a class="reference external" href="https://www.gradio.app/">gradio</a> received an interface for text
classification (<cite>gifr-textclass</cite>) in version 0.0.3.</p>
<p>The <a class="reference external" href="https://github.com/waikato-llm/llm-dataset-converter">llm-dataset-converter</a> library
obtained native support for text classification formats with version 0.1.1.</p></description><category>release</category><guid>https://www.data-mining.co.nz/news/2024-02-15-text-classification-support/</guid><pubDate>Thu, 15 Feb 2024 03:46:00 GMT</pubDate></item><item><title>Llama-2 Docker images available</title><link>https://www.data-mining.co.nz/news/2023-11-10-llama2-docker/</link><dc:creator>University of Waikato</dc:creator><description><p>Llama-2, despite <a class="reference external" href="https://blog.opensource.org/metas-llama-2-license-is-not-open-source/">not actually being open-source as advertised</a>,
is a very powerful large language model (LLM), which can also be fine-tuned with custom data. With
version <a class="reference external" href="https://github.com/waikato-llm/llm-dataset-converter/releases/tag/v0.0.3">v0.0.3</a>
of our <a class="reference external" href="https://github.com/waikato-llm/llm-dataset-converter">llm-dataset-converter</a> Python library,
it is now possible to generate data in <a class="reference external" href="https://jsonlines.org/">jsonlines</a> format that the new
<a class="reference external" href="https://github.com/waikato-llm/huggingface_transformers/tree/master/4.31.0_cuda11.7_llama2">Docker images</a>
for Llama-2 can consume:</p>
<ul class="simple">
<li><p>In-house registry:</p>
<ul>
<li><p><code class="docutils literal"><span class="pre">public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-huggingface-transformers:4.31.0_cuda11.7_llama2</span></code></p></li>
</ul>
</li>
<li><p>Docker hub:</p>
<ul>
<li><p><code class="docutils literal"><span class="pre">waikatodatamining/pytorch-huggingface-transformers:4.31.0_cuda11.7_llama2</span></code></p></li>
</ul>
</li>
</ul>
<p>Of course, you can use these Docker images in conjunction with our <a class="reference external" href="https://www.data-mining.co.nz/news/2023-11-03-gifr-release/">gifr</a>
Python library for <a class="reference external" href="https://www.gradio.app/">gradio</a> interfaces as well (<cite>gifr-textgen</cite>).</p></description><category>release</category><guid>https://www.data-mining.co.nz/news/2023-11-10-llama2-docker/</guid><pubDate>Fri, 10 Nov 2023 03:33:00 GMT</pubDate></item><item><title>gifr release</title><link>https://www.data-mining.co.nz/news/2023-11-03-gifr-release/</link><dc:creator>University of Waikato</dc:creator><description><p>A lot of our Docker images allow the user to make predictions in two ways: using simple
file-polling or via a <a class="reference external" href="https://redis.io/">Redis</a> backend. File-polling is great for
testing, but unsuitable for a production system due to wear-and-tear on SSDs.</p>
<p>Initially, I developed a really simple library for sending and receiving data via Redis,
called <em>simple-redis-helper</em>:</p>
<p><a class="reference external" href="https://github.com/fracpete/simple-redis-helper">https://github.com/fracpete/simple-redis-helper</a></p>
<p>With this library you get some command-line tools for broadcasting, listening, etc. Sufficient
for someone who is comfortable with the command-line (or especially when logged in remotely
via terminal), but not so great for your clients.</p>
<p>Now, there is the brilliant <a class="reference external" href="https://www.gradio.app/">gradio</a> library that was specifically
developed for such scenarios: to create easy to use and great looking interfaces for your machine
learning models.</p>
<p>The last couple of days, I have put together a new library that is tailored to our Docker images
called <em>gifr</em>:</p>
<p><a class="reference external" href="https://github.com/waikato-datamining/gifr">https://github.com/waikato-datamining/gifr</a></p>
<p>With the first release, the following types of models are supported:</p>
<ul class="simple">
<li><p>image classification</p></li>
<li><p>image segmentation</p></li>
<li><p>object detection/instance segmentation</p></li>
<li><p>text generation</p></li>
</ul></description><category>release</category><guid>https://www.data-mining.co.nz/news/2023-11-03-gifr-release/</guid><pubDate>Fri, 03 Nov 2023 01:00:00 GMT</pubDate></item><item><title>llm-dataset-converter release</title><link>https://www.data-mining.co.nz/news/2023-10-27-ldc-release/</link><dc:creator>University of Waikato</dc:creator><description><p>Over the last couple of months, we have been working on a little command-line tool that
allows you to convert LLM datasets from one format into another, appropriately called
<cite>llm-dataset-converter</cite>:</p>
<p><a class="reference external" href="https://github.com/waikato-llm/llm-dataset-converter">https://github.com/waikato-llm/llm-dataset-converter</a></p>
<p>With the first release (0.0.1), you can not only load data from and save to in various formats
(csv/tsv, text, json, jsonlines, parquet). The tool lets you define pipelines using the following format:</p>
<p><cite>reader [filter [filter ...]] [writer]</cite></p>
<p>Each component in the pipeline comes with its own set of command-line parameters. You can even <em>tee</em> off
records and process them differently (e.g., writing the same data to different output formats).</p>
<p>The library also has other tools, for downloading files or datasets from huggingface or combining text files.</p>
<p>In order to make building such pipeline-oriented tools simpler to develop, we created a base library
that manages the handling of plugins (and, if necessary, their compatibility) called <cite>seppl</cite>
(<em>Simple Entry Point PipeLines</em>):</p>
<p><a class="reference external" href="https://github.com/waikato-datamining/seppl">https://github.com/waikato-datamining/seppl</a></p>
<p>Thanks to seppl, the llm-dataset-converter library can be easily extended with additional modules, as it uses
a dynamic approach to locating plugins: you only need to define in what modules to look for what superclass
(like <cite>Reader</cite>, <cite>Filter</cite>, <cite>Writer</cite>).</p></description><category>release</category><guid>https://www.data-mining.co.nz/news/2023-10-27-ldc-release/</guid><pubDate>Thu, 26 Oct 2023 20:47:00 GMT</pubDate></item></channel></rss>