RStudio AI Blog
https://blogs.rstudio.com/tensorflow/
News, concepts, and applications as regards deep learning, probabilistic computation, distributed computing and machine learning automation from R.
RStudio AI Bloghttps://blogs.rstudio.com/tensorflow/images/favicon.png
https://blogs.rstudio.com/tensorflow/
DistillSun, 18 Oct 2020 00:00:00 +0000Classifying images with torchSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-10-19-torch-image-classification
<p>In recent posts, we’ve been exploring essential <code>torch</code> functionality: <a href="https://blogs.rstudio.com/ai/posts/2020-10-01-torch-network-from-scratch/">tensors</a>, the sine qua non of every deep learning framework; <a href="https://blogs.rstudio.com/ai/posts/2020-10-05-torch-network-with-autograd">autograd</a>, <code>torch</code>’s implementation of reverse-mode automatic differentiation; <a href="https://blogs.rstudio.com/ai/posts/2020-10-07-torch-modules">modules</a>, composable building blocks of neural networks; and <a href="https://blogs.rstudio.com/ai/posts/2020-10-09-torch-optim/">optimizers</a>, the – well – optimization algorithms that <code>torch</code> provides.</p>
<p>But we haven’t really had our “hello world” moment yet, at least not if by “hello world” you mean the inevitable <em>deep learning experience of classifying pets</em>. Cat or dog? Beagle or boxer? Chinook or Chihuahua? We’ll distinguish ourselves by asking a (slightly) different question: What kind of bird?</p>
<p>Topics we’ll address on our way:</p>
<ul>
<li><p>The core roles of <code>torch</code> <em>datasets</em> and <em>data loaders</em>, respectively.</p></li>
<li><p>How to apply <code>transform</code>s, both for image preprocessing and data augmentation.</p></li>
<li><p>How to use Resnet <span class="citation">(He et al. 2015)</span>, a pre-trained model that comes with <code>torchvision</code>, for transfer learning.</p></li>
<li><p>How to use learning rate schedulers, and in particular, the one-cycle learning rate algorithm [@abs-1708-07120].</p></li>
<li><p>How to find a good initial learning rate.</p></li>
</ul>
<p>For convenience, the code is available on <a href="https://colab.research.google.com/drive/1OJzzqiQVbh3ZdLB2L2t_DhBGInlh9o-k?usp=sharing">Google Colaboratory</a> – no copy-pasting required.</p>
<h2 id="data-loading-and-preprocessing">Data loading and preprocessing</h2>
<p>The example dataset used here is available on <a href="https://www.kaggle.com/gpiosenka/100-bird-species/data" class="uri">Kaggle</a>.</p>
<p>Conveniently, it may be obtained using <a href="https://github.com/mlverse/torchdatasets"><code>torchdatasets</code></a>, which uses <a href="https://github.com/rstudio/pins"><code>pins</code></a> for authentication, retrieval and storage. To enable <code>pins</code> to manage your Kaggle downloads, please follow the instructions <a href="https://pins.rstudio.com/articles/boards-kaggle.html">here</a>.</p>
<p>This dataset is very “clean”, unlike the images we may be used to from, e.g., <a href="http://image-net.org/">ImageNet</a>. To help with generalization, we introduce noise during training – in other words, we perform <em>data augmentation</em>. In <code>torchvision</code>, data augmentation is part of an <em>image processing pipeline</em> that first converts an image to a tensor, and then applies any transformations such as resizing, cropping, normalization, or various forms of distorsion.</p>
<p>Below are the transformations performed on the training set. Note how most of them are for data augmentation, while normalization is done to comply with what’s expected by ResNet.</p>
<h4 id="image-preprocessing-pipeline">Image preprocessing pipeline</h4>
<pre class="r"><code>library(torch)
library(torchvision)
library(torchdatasets)
library(dplyr)
library(pins)
library(ggplot2)
device <- if (cuda_is_available()) torch_device("cuda:0") else "cpu"
train_transforms <- function(img) {
img %>%
# first convert image to tensor
transform_to_tensor() %>%
# then move to the GPU (if available)
(function(x) x$to(device = device)) %>%
# data augmentation
transform_random_resized_crop(size = c(224, 224)) %>%
# data augmentation
transform_color_jitter() %>%
# data augmentation
transform_random_horizontal_flip() %>%
# normalize according to what is expected by resnet
transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))
}</code></pre>
<p>On the validation set, we don’t want to introduce noise, but still need to resize, crop, and normalize the images. The test set should be treated identically.</p>
<pre class="r"><code>valid_transforms <- function(img) {
img %>%
transform_to_tensor() %>%
(function(x) x$to(device = device)) %>%
transform_resize(256) %>%
transform_center_crop(224) %>%
transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))
}
test_transforms <- valid_transforms</code></pre>
<p>And now, let’s get the data, nicely divided into training, validation and test sets. Additionally, we tell the corresponding R objects what transformations they’re expected to apply:<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a></p>
<pre class="r"><code>train_ds <- bird_species_dataset("data", download = TRUE, transform = train_transforms)
valid_ds <- bird_species_dataset("data", split = "valid", transform = valid_transforms)
test_ds <- bird_species_dataset("data", split = "test", transform = test_transforms)</code></pre>
<p>Two things to note. First, transformations are part of the <em>dataset</em> concept, as opposed to the <em>data loader</em> we’ll encounter shortly. Second, let’s take a look at how the images have been stored on disk. The overall directory structure (starting from <code>data</code>, which we specified as the root directory to be used) is this:</p>
<pre><code>data/bird_species/train
data/bird_species/valid
data/bird_species/test</code></pre>
<p>In the <code>train</code>, <code>valid</code>, and <code>test</code> directories, different classes of images reside in their own folders. For example, here is the directory layout for the first three classes in the test set:</p>
<pre><code>data/bird_species/test/ALBATROSS/
- data/bird_species/test/ALBATROSS/1.jpg
- data/bird_species/test/ALBATROSS/2.jpg
- data/bird_species/test/ALBATROSS/3.jpg
- data/bird_species/test/ALBATROSS/4.jpg
- data/bird_species/test/ALBATROSS/5.jpg
data/test/'ALEXANDRINE PARAKEET'/
- data/bird_species/test/'ALEXANDRINE PARAKEET'/1.jpg
- data/bird_species/test/'ALEXANDRINE PARAKEET'/2.jpg
- data/bird_species/test/'ALEXANDRINE PARAKEET'/3.jpg
- data/bird_species/test/'ALEXANDRINE PARAKEET'/4.jpg
- data/bird_species/test/'ALEXANDRINE PARAKEET'/5.jpg
data/test/'AMERICAN BITTERN'/
- data/bird_species/test/'AMERICAN BITTERN'/1.jpg
- data/bird_species/test/'AMERICAN BITTERN'/2.jpg
- data/bird_species/test/'AMERICAN BITTERN'/3.jpg
- data/bird_species/test/'AMERICAN BITTERN'/4.jpg
- data/bird_species/test/'AMERICAN BITTERN'/5.jpg</code></pre>
<p>This is exactly the kind of layout expected by <code>torch</code>s <code>image_folder_dataset()</code> – and really <code>bird_species_dataset()</code> instantiates a subtype of this class. Had we downloaded the data manually, respecting the required directory structure, we could have created the datasets like so:</p>
<pre class="r"><code># e.g.
train_ds <- image_folder_dataset(
file.path(data_dir, "train"),
transform = train_transforms)</code></pre>
<p>Now that we got the data, let’s see how many items there are in each set.</p>
<pre class="r"><code>train_ds$.length()
valid_ds$.length()
test_ds$.length()</code></pre>
<pre><code>31316
1125
1125</code></pre>
<p>That training set is really big! It’s thus recommended to run this on GPU, or just play around with the provided Colab notebook.</p>
<p>With so many samples, we’re curious how many classes there are.</p>
<pre class="r"><code>class_names <- test_ds$classes
length(class_names)</code></pre>
<pre><code>225</code></pre>
<p>So we <em>do</em> have a substantial training set, but the task is formidable as well: We’re going to tell apart no less than 225 different bird species.</p>
<h4 id="data-loaders">Data loaders</h4>
<p>While <em>datasets</em> know what to do with each single item, <em>data loaders</em> know how to treat them collectively. How many samples make up a batch? Do we want to feed them in the same order always, or instead, have a different order chosen for every epoch?</p>
<pre class="r"><code>batch_size <- 64
train_dl <- dataloader(train_ds, batch_size = batch_size, shuffle = TRUE)
valid_dl <- dataloader(valid_ds, batch_size = batch_size)
test_dl <- dataloader(test_ds, batch_size = batch_size)</code></pre>
<p>Data loaders, too, may be queried for their length. Now length means: How many batches?</p>
<pre class="r"><code>train_dl$.length()
valid_dl$.length()
test_dl$.length() </code></pre>
<pre><code>490
18
18</code></pre>
<h4 id="some-birds">Some birds</h4>
<p>Next, let’s view a few images from the test set. We can retrieve the first batch – images and corresponding classes – by creating an iterator from the <code>dataloader</code> and calling <code>next()</code> on it:</p>
<pre class="r"><code># for display purposes, here we are actually using a batch_size of 24
batch <- train_dl$.iter()$.next()</code></pre>
<p><code>batch</code> is a list, the first item being the image tensors:</p>
<pre class="r"><code>batch[[1]]$size()</code></pre>
<pre><code>[1] 24 3 224 224</code></pre>
<p>And the second, the classes:</p>
<pre class="r"><code>batch[[2]]$size()</code></pre>
<pre><code>[1] 24</code></pre>
<p>Classes are coded as integers, to be used as indices in a vector of class names. We’ll use those for labeling the images.</p>
<pre class="r"><code>classes <- batch[[2]]
classes</code></pre>
<pre><code>torch_tensor
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4
5
5
5
5
[ GPULongType{24} ]</code></pre>
<p>The image tensors have shape <code>batch_size x num_channels x height x width</code>. For plotting using <code>as.raster()</code>, we need to reshape the images such that channels come last. We also undo the normalization applied by the <code>dataloader</code>.</p>
<p>Here are the first twenty-four images:</p>
<pre class="r"><code>library(dplyr)
images <- as_array(batch[[1]]) %>% aperm(perm = c(1, 3, 4, 2))
mean <- c(0.485, 0.456, 0.406)
std <- c(0.229, 0.224, 0.225)
images <- std * images + mean
images <- images * 255
images[images > 255] <- 255
images[images < 0] <- 0
par(mfcol = c(4,6), mar = rep(1, 4))
images %>%
purrr::array_tree(1) %>%
purrr::set_names(class_names[as_array(classes)]) %>%
purrr::map(as.raster, max = 255) %>%
purrr::iwalk(~{plot(.x); title(.y)})</code></pre>
<p><img src="https://blogs.rstudio.com/tensorflow//posts/2020-10-19-torch-image-classification/images/image_classif_birds.png" width="576" /></p>
<h2 id="model">Model</h2>
<p>The backbone of our model is a pre-trained instance of ResNet.</p>
<pre class="r"><code>model <- model_resnet18(pretrained = TRUE)</code></pre>
<p>But we want to distinguish among our 225 bird species, while ResNet was trained on 1000 different classes. What can we do? We simply replace the output layer.</p>
<p>The new output layer is also the only one whose weights we are going to train – leaving all other ResNet parameters the way they are. Technically, we <em>could</em> perform backpropagation through the complete model, striving to fine-tune ResNet’s weights as well. However, this would slow down training significantly. In fact, the choice is not all-or-none: It is up to us how many of the original parameters to keep fixed, and how many to “set free” for fine tuning. For the task at hand, we’ll be content to just train the newly added output layer: With the abundance of animals, including birds, in ImageNet, we expect the trained ResNet to know a lot about them!</p>
<pre class="r"><code>model$parameters %>% purrr::walk(function(param) param$requires_grad_(FALSE))</code></pre>
<p>To replace the output layer, the model is modified in-place:</p>
<pre class="r"><code>num_features <- model$fc$in_features
model$fc <- nn_linear(in_features = num_features, out_features = length(class_names))</code></pre>
<p>Now put the modified model on the GPU (if available):</p>
<pre class="r"><code>model <- model$to(device = device)</code></pre>
<h2 id="training">Training</h2>
<p>For optimization, we use cross entropy loss and stochastic gradient descent.</p>
<pre class="r"><code>criterion <- nn_cross_entropy_loss()
optimizer <- optim_sgd(model$parameters, lr = 0.1, momentum = 0.9)</code></pre>
<h4 id="finding-an-optimally-efficient-learning-rate">Finding an optimally efficient learning rate</h4>
<p>We set the learning rate to <code>0.1</code>, but that is just a formality. As has become widely known due to the excellent lectures by <a href="http://fast.ai">fast.ai</a>, it makes sense to spend some time upfront to determine an efficient learning rate. While out-of-the-box, <code>torch</code> does not provide a tool like fast.ai’s learning rate finder, the logic is straightforward to implement. Here’s how to find a good learning rate, as translated to R from <a href="https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html">Sylvain Gugger’s post</a>:</p>
<pre class="r"><code># ported from: https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html
losses <- c()
log_lrs <- c()
find_lr <- function(init_value = 1e-8, final_value = 10, beta = 0.98) {
num <- train_dl$.length()
mult = (final_value/init_value)^(1/num)
lr <- init_value
optimizer$param_groups[[1]]$lr <- lr
avg_loss <- 0
best_loss <- 0
batch_num <- 0
for (b in enumerate(train_dl)) {
batch_num <- batch_num + 1
optimizer$zero_grad()
output <- model(b[[1]]$to(device = device))
loss <- criterion(output, b[[2]]$to(device = device))
#Compute the smoothed loss
avg_loss <- beta * avg_loss + (1-beta) * loss$item()
smoothed_loss <- avg_loss / (1 - beta^batch_num)
#Stop if the loss is exploding
if (batch_num > 1 && smoothed_loss > 4 * best_loss) break
#Record the best loss
if (smoothed_loss < best_loss || batch_num == 1) best_loss <- smoothed_loss
#Store the values
losses <<- c(losses, smoothed_loss)
log_lrs <<- c(log_lrs, (log(lr, 10)))
loss$backward()
optimizer$step()
#Update the lr for the next step
lr <- lr * mult
optimizer$param_groups[[1]]$lr <- lr
}
}
find_lr()
df <- data.frame(log_lrs = log_lrs, losses = losses)
ggplot(df, aes(log_lrs, losses)) + geom_point(size = 1) + theme_classic()</code></pre>
<p><img src="https://blogs.rstudio.com/tensorflow//posts/2020-10-19-torch-image-classification/images/lr_finder.png" width="372" /></p>
<p>The best learning rate is not the exact one where loss is at a minimum. Instead, it should be picked somewhat earlier on the curve, while loss is still decreasing. <code>0.05</code> looks like a sensible choice.</p>
<p>This value is nothing but an anchor, however. <em>Learning rate schedulers</em> allow learning rates to evolve according to some proven algorithm. Among others, <code>torch</code> implements one-cycle learning [@abs-1708-07120], cyclical learning rates <span class="citation">(Smith 2015)</span>, and cosine annealing with warm restarts <span class="citation">(Loshchilov and Hutter 2016)</span>.</p>
<p>Here, we use <code>lr_one_cycle()</code>, passing in our newly found, optimally efficient, hopefully, value <code>0.05</code> as a maximum learning rate. <code>lr_one_cycle()</code> will start with a low rate, then gradually ramp up until it reaches the allowed maximum. After that, the learning rate will slowly, continuously decrease, until it falls slightly below its initial value.</p>
<p>All this happens not per epoch, but exactly once, which is why the name has <code>one_cycle</code> in it. Here’s how the evolution of learning rates looks in our example:</p>
<p><img src="https://blogs.rstudio.com/tensorflow//posts/2020-10-19-torch-image-classification/images/one_cycle_lr.png" width="315" /></p>
<p>Before we start training, let’s quickly re-initialize the model, so as to start from a clean slate:</p>
<pre class="r"><code>model <- model_resnet18(pretrained = TRUE)
model$parameters %>% purrr::walk(function(param) param$requires_grad_(FALSE))
num_features <- model$fc$in_features
model$fc <- nn_linear(in_features = num_features, out_features = length(class_names))
model <- model$to(device = device)
criterion <- nn_cross_entropy_loss()
optimizer <- optim_sgd(model$parameters, lr = 0.05, momentum = 0.9)</code></pre>
<p>And instantiate the scheduler:</p>
<pre class="r"><code>num_epochs = 10
scheduler <- optimizer %>%
lr_one_cycle(max_lr = 0.05, epochs = num_epochs, steps_per_epoch = train_dl$.length())</code></pre>
<h4 id="training-loop">Training loop</h4>
<p>Now we train for ten epochs. For every training batch, we call <code>scheduler$step()</code> to adjust the learning rate. Notably, this has to be done <em>after</em> <code>optimizer$step()</code>.</p>
<pre class="r"><code>train_batch <- function(b) {
optimizer$zero_grad()
output <- model(b[[1]])
loss <- criterion(output, b[[2]]$to(device = device))
loss$backward()
optimizer$step()
scheduler$step()
loss$item()
}
valid_batch <- function(b) {
output <- model(b[[1]])
loss <- criterion(output, b[[2]]$to(device = device))
loss$item()
}
for (epoch in 1:num_epochs) {
model$train()
train_losses <- c()
for (b in enumerate(train_dl)) {
loss <- train_batch(b)
train_losses <- c(train_losses, loss)
}
model$eval()
valid_losses <- c()
for (b in enumerate(valid_dl)) {
loss <- valid_batch(b)
valid_losses <- c(valid_losses, loss)
}
cat(sprintf("\nLoss at epoch %d: training: %3f, validation: %3f\n", epoch, mean(train_losses), mean(valid_losses)))
}</code></pre>
<pre><code>Loss at epoch 1: training: 2.662901, validation: 0.790769
Loss at epoch 2: training: 1.543315, validation: 1.014409
Loss at epoch 3: training: 1.376392, validation: 0.565186
Loss at epoch 4: training: 1.127091, validation: 0.575583
Loss at epoch 5: training: 0.916446, validation: 0.281600
Loss at epoch 6: training: 0.775241, validation: 0.215212
Loss at epoch 7: training: 0.639521, validation: 0.151283
Loss at epoch 8: training: 0.538825, validation: 0.106301
Loss at epoch 9: training: 0.407440, validation: 0.083270
Loss at epoch 10: training: 0.354659, validation: 0.080389</code></pre>
<p>It looks like the model made good progress, but we don’t yet know anything about classification accuracy in absolute terms. We’ll check that out on the test set.</p>
<h2 id="test-set-accuracy">Test set accuracy</h2>
<p>Finally, we calculate accuracy on the test set:</p>
<pre class="r"><code>model$eval()
test_batch <- function(b) {
output <- model(b[[1]])
labels <- b[[2]]$to(device = device)
loss <- criterion(output, labels)
test_losses <<- c(test_losses, loss$item())
# torch_max returns a list, with position 1 containing the values
# and position 2 containing the respective indices
predicted <- torch_max(output$data(), dim = 2)[[2]]
total <<- total + labels$size(1)
# add number of correct classifications in this batch to the aggregate
correct <<- correct + (predicted == labels)$sum()$item()
}
test_losses <- c()
total <- 0
correct <- 0
for (b in enumerate(test_dl)) {
test_batch(b)
}
mean(test_losses)</code></pre>
<pre><code>[1] 0.03719</code></pre>
<pre class="r"><code>test_accuracy <- correct/total
test_accuracy</code></pre>
<pre><code>[1] 0.98756</code></pre>
<p>An impressive result, given how many different species there are!</p>
<h2 id="wrapup">Wrapup</h2>
<p>Hopefully, this has been a useful introduction to classifying images with <code>torch</code>, as well as to its non-domain-specific architectural elements, like datasets, data loaders, and learning-rate schedulers. Future posts will explore other domains, as well as move on beyond “hello world” in image recognition. Thanks for reading!</p>
<pre class="r distill-force-highlighting-css"><code></code></pre>
<div id="refs" class="references hanging-indent">
<div id="ref-HeZRS15">
<p>He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Deep Residual Learning for Image Recognition.” <em>CoRR</em> abs/1512.03385. <a href="http://arxiv.org/abs/1512.03385">http://arxiv.org/abs/1512.03385</a>.</p>
</div>
<div id="ref-LoshchilovH16a">
<p>Loshchilov, Ilya, and Frank Hutter. 2016. “SGDR: Stochastic Gradient Descent with Restarts.” <em>CoRR</em> abs/1608.03983. <a href="http://arxiv.org/abs/1608.03983">http://arxiv.org/abs/1608.03983</a>.</p>
</div>
<div id="ref-Smith15a">
<p>Smith, Leslie N. 2015. “No More Pesky Learning Rate Guessing Games.” <em>CoRR</em> abs/1506.01186. <a href="http://arxiv.org/abs/1506.01186">http://arxiv.org/abs/1506.01186</a>.</p>
</div>
</div>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Physically, the dataset consists of a single <code>zip</code> file; so it is really the first instruction that downloads all the data. The remaining two function calls perform semantic mappings only.<a href="#fnref1" class="footnote-back">↩︎</a></p></li>
</ol>
</div>99e215574cdffd1c16282428321ce8e7TorchRImage Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2020-10-19-torch-image-classificationSun, 18 Oct 2020 00:00:00 +0000sparklyr.flint 0.2: ASOF Joins, OLS Regression, and additional summarizersYitao Li
https://blogs.rstudio.com/tensorflow/posts/2020-10-12-sparklyr-flint-0.2.0-released
<p>Since <a href="https://cran.r-project.org/web/packages/sparklyr.flint/index.html"><code>sparklyr.flint</code></a>, a <a href="https://sparklyr.ai"><code>sparklyr</code></a> extension for leveraging <a href="https://github.com/twosigma/flint">Flint</a> time series functionalities through <code>sparklyr</code>, was <a href="https://blogs.rstudio.com/ai/posts/2020-09-07-sparklyr-flint">introduced</a> in September, we have made a number of enhancements to it, and have successfully submitted <code>sparklyr.flint</code> 0.2 to CRAN.</p>
<p>In this blog post, we highlight the following new features and improvements from <code>sparklyr.flint</code> 0.2:</p>
<ul>
<li><a href="#asof-joins">ASOF Joins</a> of Timeseries RDDs</li>
<li><a href="#ols-regression">OLS Regression</a></li>
<li><a href="#additional-summarizers">Additional Summarizers</a></li>
<li><a href="#better-integration-with-sparklyr">Better Integration With <code>sparklyr</code></a></li>
</ul>
<h2 id="asof-joins">ASOF Joins</h2>
<p>For those unfamiliar with the term, ASOF joins are temporal join operations based on inexact matching of timestamps. Within the context of <a href="https://spark.apache.org">Apache Spark</a>, a join operation, loosely speaking, matches records from two data frames (let’s call them <code>left</code> and <code>right</code>) based on some criteria. A temporal join implies matching records in <code>left</code> and <code>right</code> based on timestamps, and with inexact matching of timestamps permitted, it is typically useful to join <code>left</code> and <code>right</code> along one of the following temporal directions:</p>
<ol style="list-style-type: decimal">
<li>Looking behind: if a record from <code>left</code> has timestamp <code>t</code>, then it gets matched with ones from <code>right</code> having the most recent timestamp less than or equal to <code>t</code>.</li>
<li>Looking ahead: if a record from <code>left</code> has timestamp <code>t,</code> then it gets matched with ones from <code>right</code> having the smallest timestamp greater than or equal to (or alternatively, strictly greater than) <code>t</code>.</li>
</ol>
<p>However, oftentimes it is not useful to consider two timestamps as “matching” if they are too far apart. Therefore, an additional constraint on the maximum amount of time to look behind or look ahead is usually also part of an ASOF join operation.</p>
<p>In <code>sparklyr.flint</code> 0.2, all ASOF join functionalities of Flint are accessible via the <code>asof_join()</code> method. For example, given 2 timeseries RDDs <code>left</code> and <code>right</code>:</p>
<pre><code>library(sparklyr)
library(sparklyr.flint)
sc <- spark_connect(master = "local")
left <- copy_to(sc, tibble::tibble(t = seq(10), u = seq(10))) %>%
from_sdf(is_sorted = TRUE, time_unit = "SECONDS", time_column = "t")
right <- copy_to(sc, tibble::tibble(t = seq(10) + 1, v = seq(10) + 1L)) %>%
from_sdf(is_sorted = TRUE, time_unit = "SECONDS", time_column = "t")</code></pre>
<p>The following prints the result of matching each record from <code>left</code> with the most recent record(s) from <code>right</code> that are at most 1 second behind.</p>
<pre><code>print(asof_join(left, right, tol = "1s", direction = ">=") %>% to_sdf())
## # Source: spark<?> [?? x 3]
## time u v
## <dttm> <int> <int>
## 1 1970-01-01 00:00:01 1 NA
## 2 1970-01-01 00:00:02 2 2
## 3 1970-01-01 00:00:03 3 3
## 4 1970-01-01 00:00:04 4 4
## 5 1970-01-01 00:00:05 5 5
## 6 1970-01-01 00:00:06 6 6
## 7 1970-01-01 00:00:07 7 7
## 8 1970-01-01 00:00:08 8 8
## 9 1970-01-01 00:00:09 9 9
## 10 1970-01-01 00:00:10 10 10</code></pre>
<p>Whereas if we change the temporal direction to “<”, then each record from <code>left</code> will be matched with any record(s) from <code>right</code> that is strictly in the future and is at most 1 second ahead of the current record from <code>left</code>:</p>
<pre><code>print(asof_join(left, right, tol = "1s", direction = "<") %>% to_sdf())
## # Source: spark<?> [?? x 3]
## time u v
## <dttm> <int> <int>
## 1 1970-01-01 00:00:01 1 2
## 2 1970-01-01 00:00:02 2 3
## 3 1970-01-01 00:00:03 3 4
## 4 1970-01-01 00:00:04 4 5
## 5 1970-01-01 00:00:05 5 6
## 6 1970-01-01 00:00:06 6 7
## 7 1970-01-01 00:00:07 7 8
## 8 1970-01-01 00:00:08 8 9
## 9 1970-01-01 00:00:09 9 10
## 10 1970-01-01 00:00:10 10 11</code></pre>
<p>Notice regardless of which temporal direction is selected, an outer-left join is always performed (i.e., all timestamp values and <code>u</code> values of <code>left</code> from above will always be present in the output, and the <code>v</code> column in the output will contain <code>NA</code> whenever there is no record from <code>right</code> that meets the matching criteria).</p>
<h2 id="ols-regression">OLS Regression</h2>
<p>You might be wondering whether the version of this functionality in Flint is more or less identical to <code>lm()</code> in R. Turns out it has much more to offer than <code>lm()</code> does. An OLS regression in Flint will compute useful metrics such as <a href="https://en.wikipedia.org/wiki/Akaike_information_criterion">Akaike information criterion</a> and <a href="https://en.wikipedia.org/wiki/Bayesian_information_criterion">Bayesian information criterion</a>, both of which are useful for model selection purposes, and the calculations of both are parallelized by Flint to fully utilize computational power available in a Spark cluster. In addition, Flint supports ignoring regressors that are constant or nearly constant, which becomes useful when an intercept term is included. To see why this is the case, we need to briefly examine the goal of the OLS regression, which is to find some column vector of coefficients <span class="math inline">\(\mathbf{\beta}\)</span> that minimizes <span class="math inline">\(\|\mathbf{y} - \mathbf{X} \mathbf{\beta}\|^2\)</span>, where <span class="math inline">\(\mathbf{y}\)</span> is the column vector of response variables, and <span class="math inline">\(\mathbf{X}\)</span> is a matrix consisting of columns of regressors plus an entire column of <span class="math inline">\(1\)</span>s representing the intercept terms. The solution to this problem is <span class="math inline">\(\mathbf{\beta} = (\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{X}^\intercal\mathbf{y}\)</span>, assuming the Gram matrix <span class="math inline">\(\mathbf{X}^\intercal\mathbf{X}\)</span> is non-singular. However, if <span class="math inline">\(\mathbf{X}\)</span> contains a column of all <span class="math inline">\(1\)</span>s of intercept terms, and another column formed by a regressor that is constant (or nearly so), then columns of <span class="math inline">\(\mathbf{X}\)</span> will be linearly dependent (or nearly so) and <span class="math inline">\(\mathbf{X}^\intercal\mathbf{X}\)</span> will be singular (or nearly so), which presents an issue computation-wise. However, if a regressor is constant, then it essentially plays the same role as the intercept terms do. So simply excluding such a constant regressor in <span class="math inline">\(\mathbf{X}\)</span> solves the problem. Also, speaking of inverting the Gram matrix, readers remembering the concept of “condition number” from numerical analysis must be thinking to themselves how computing <span class="math inline">\(\mathbf{\beta} = (\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{X}^\intercal\mathbf{y}\)</span> could be numerically unstable if <span class="math inline">\(\mathbf{X}^\intercal\mathbf{X}\)</span> has a large condition number. This is why Flint also outputs the condition number of the Gram matrix in the OLS regression result, so that one can sanity-check the underlying quadratic minimization problem being solved is well-conditioned.</p>
<p>So, to summarize, the OLS regression functionality implemented in Flint not only outputs the solution to the problem, but also calculates useful metrics that help data scientists assess the sanity and predictive quality of the resulting model.</p>
<p>To see OLS regression in action with <code>sparklyr.flint</code>, one can run the following example:</p>
<pre><code>mtcars_sdf <- copy_to(sc, mtcars, overwrite = TRUE) %>%
dplyr::mutate(time = 0L)
mtcars_ts <- from_sdf(mtcars_sdf, is_sorted = TRUE, time_unit = "SECONDS")
model <- ols_regression(mtcars_ts, mpg ~ hp + wt) %>% to_sdf()
print(model %>% dplyr::select(akaikeIC, bayesIC, cond))
## # Source: spark<?> [?? x 3]
## akaikeIC bayesIC cond
## <dbl> <dbl> <dbl>
## 1 155. 159. 345403.
# ^ output says condition number of the Gram matrix was within reason</code></pre>
<p>and obtain <span class="math inline">\(\mathbf{\beta}\)</span>, the vector of optimal coefficients, with the following:</p>
<pre><code>print(model %>% dplyr::pull(beta))
## [[1]]
## [1] -0.03177295 -3.87783074</code></pre>
<h2 id="additional-summarizers">Additional Summarizers</h2>
<p>The EWMA (Exponential Weighted Moving Average), EMA half-life, and the standardized moment summarizers (namely, skewness and kurtosis) along with a few others which were missing in <code>sparklyr.flint</code> 0.1 are now fully supported in <code>sparklyr.flint</code> 0.2.</p>
<h2 id="better-integration-with-sparklyr">Better Integration With <code>sparklyr</code></h2>
<p>While <code>sparklyr.flint</code> 0.1 included a <code>collect()</code> method for exporting data from a Flint time-series RDD to an R data frame, it did not have a similar method for extracting the underlying Spark data frame from a Flint time-series RDD. This was clearly an oversight. In <code>sparklyr.flint</code> 0.2, one can call <code>to_sdf()</code> on a timeseries RDD to get back a Spark data frame that is usable in <code>sparklyr</code> (e.g., as shown by <code>model %>% to_sdf() %>% dplyr::select(...)</code> examples from above). One can also get to the underlying Spark data frame JVM object reference by calling <code>spark_dataframe()</code> on a Flint time-series RDD (this is usually unnecessary in vast majority of <code>sparklyr</code> use cases though).</p>
<h2 id="conclusion">Conclusion</h2>
<p>We have presented a number of new features and improvements introduced in <code>sparklyr.flint</code> 0.2 and deep-dived into some of them in this blog post. We hope you are as excited about them as we are.</p>
<p>Thanks for reading!</p>
<h2 id="acknowledgement">Acknowledgement</h2>
<p>The author would like to thank Mara (<a href="https://github.com/batpigandme">@batpigandme</a>), Sigrid (<a href="https://github.com/skeydan">@skeydan</a>), and Javier (<a href="https://github.com/javierluraschi">@javierluraschi</a>) for their fantastic editorial inputs on this blog post!</p>
<pre class="r distill-force-highlighting-css"><code></code></pre>21fc4b2e51e9d4a153f2dae536e7279bRPackages/ReleasesTime Serieshttps://blogs.rstudio.com/tensorflow/posts/2020-10-12-sparklyr-flint-0.2.0-releasedMon, 12 Oct 2020 00:00:00 +0000Optimizers in torchSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-10-09-torch-optim
<p>This is the fourth and last installment in a series introducing <code>torch</code> basics. Initially, we <a href="https://blogs.rstudio.com/ai/posts/2020-10-01-torch-network-from-scratch/">focused on <em>tensors</em></a>. To illustrate their power, we coded a complete (if toy-size) neural network from scratch. We didn’t make use of any of <code>torch</code>’s higher-level capabilities – not even <em>autograd</em>, its automatic-differentiation feature.</p>
<p>This changed in the <a href="https://blogs.rstudio.com/ai/posts/2020-10-05-torch-network-with-autograd">follow-up post</a>. No more thinking about derivatives and the chain rule; a single call to <code>backward()</code> did it all.</p>
<p><a href="https://blogs.rstudio.com/ai/posts/2020-10-07-torch-modules">In the third post</a>, the code again saw a major simplification. Instead of tediously assembling a DAG<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a> by hand, we let <em>modules</em> take care of the logic.</p>
<p>Based on that last state, there are just two more things to do. For one, we still compute the loss by hand. And secondly, even though we get the gradients all nicely computed from <em>autograd</em>, we still loop over the model’s parameters, updating them all ourselves. You won’t be surprised to hear that none of this is necessary.</p>
<h2 id="losses-and-loss-functions">Losses and loss functions</h2>
<p><code>torch</code> comes with all the usual loss functions, such as mean squared error, cross entropy, Kullback-Leibler divergence, and the like. In general, there are two usage modes.</p>
<p>Take the example of calculating mean squared error. One way is to call <code>nnf_mse_loss()</code> directly on the prediction and ground truth tensors. For example:</p>
<pre class="r"><code>x <- torch_randn(c(3, 2, 3))
y <- torch_zeros(c(3, 2, 3))
nnf_mse_loss(x, y)</code></pre>
<pre><code>torch_tensor
0.682362
[ CPUFloatType{} ]</code></pre>
<p>Other loss functions designed to be called directly start with <code>nnf_</code> as well: <code>nnf_binary_cross_entropy()</code>, <code>nnf_nll_loss()</code>, <code>nnf_kl_div()</code> … and so on.<a href="#fn2" class="footnote-ref" id="fnref2"><sup>2</sup></a></p>
<p>The second way is to define the algorithm in advance and call it at some later time. Here, respective constructors all start with <code>nn_</code> and end in <code>_loss</code>. For example: <code>nn_bce_loss()</code>, <code>nn_nll_loss(),</code> <code>nn_kl_div_loss()</code> …<a href="#fn3" class="footnote-ref" id="fnref3"><sup>3</sup></a></p>
<pre class="r"><code>loss <- nn_mse_loss()
loss(x, y)</code></pre>
<pre><code>torch_tensor
0.682362
[ CPUFloatType{} ]</code></pre>
<p>This method may be preferable when one and the same algorithm should be applied to more than one pair of tensors.</p>
<h2 id="optimizers">Optimizers</h2>
<p>So far, we’ve been updating model parameters following a simple strategy: The gradients told us which direction on the loss curve was downward; the learning rate told us how big of a step to take. What we did was a straightforward implementation of <em>gradient descent</em>.</p>
<p>However, optimization algorithms used in deep learning get a lot more sophisticated than that. Below, we’ll see how to replace our manual updates using <code>optim_adam()</code>, <code>torch</code>’s implementation of the Adam algorithm <span class="citation">(Kingma and Ba 2017)</span>. First though, let’s take a quick look at how <code>torch</code> optimizers work.</p>
<p>Here is a very simple network, consisting of just one linear layer, to be called on a single data point.</p>
<pre class="r"><code>data <- torch_randn(1, 3)
model <- nn_linear(3, 1)
model$parameters</code></pre>
<pre><code>$weight
torch_tensor
-0.0385 0.1412 -0.5436
[ CPUFloatType{1,3} ]
$bias
torch_tensor
-0.1950
[ CPUFloatType{1} ]</code></pre>
<p>When we create an optimizer, we tell it what parameters it is supposed to work on.</p>
<pre class="r"><code>optimizer <- optim_adam(model$parameters, lr = 0.01)
optimizer</code></pre>
<pre><code><optim_adam>
Inherits from: <torch_Optimizer>
Public:
add_param_group: function (param_group)
clone: function (deep = FALSE)
defaults: list
initialize: function (params, lr = 0.001, betas = c(0.9, 0.999), eps = 1e-08,
param_groups: list
state: list
step: function (closure = NULL)
zero_grad: function () </code></pre>
<p>At any time, we can inspect those parameters:</p>
<pre class="r"><code>optimizer$param_groups[[1]]$params</code></pre>
<pre><code>$weight
torch_tensor
-0.0385 0.1412 -0.5436
[ CPUFloatType{1,3} ]
$bias
torch_tensor
-0.1950
[ CPUFloatType{1} ]</code></pre>
<p>Now we perform the forward and backward passes. The backward pass calculates the gradients, but does <em>not</em> update the parameters, as we can see both from the model <em>and</em> the optimizer objects:</p>
<pre class="r"><code>out <- model(data)
out$backward()
optimizer$param_groups[[1]]$params
model$parameters</code></pre>
<pre><code>$weight
torch_tensor
-0.0385 0.1412 -0.5436
[ CPUFloatType{1,3} ]
$bias
torch_tensor
-0.1950
[ CPUFloatType{1} ]
$weight
torch_tensor
-0.0385 0.1412 -0.5436
[ CPUFloatType{1,3} ]
$bias
torch_tensor
-0.1950
[ CPUFloatType{1} ]</code></pre>
<p>Calling <code>step()</code> on the optimizer actually <em>performs</em> the updates. Again, let’s check that both model and optimizer now hold the updated values:</p>
<pre class="r"><code>optimizer$step()
optimizer$param_groups[[1]]$params
model$parameters</code></pre>
<pre><code>NULL
$weight
torch_tensor
-0.0285 0.1312 -0.5536
[ CPUFloatType{1,3} ]
$bias
torch_tensor
-0.2050
[ CPUFloatType{1} ]
$weight
torch_tensor
-0.0285 0.1312 -0.5536
[ CPUFloatType{1,3} ]
$bias
torch_tensor
-0.2050
[ CPUFloatType{1} ]</code></pre>
<p>If we perform optimization in a loop, we need to make sure to call <code>optimizer$zero_grad()</code> on every step, as otherwise gradients would be accumulated. You can see this in our final version of the network.</p>
<h2 id="simple-network-final-version">Simple network: final version</h2>
<pre class="r"><code>library(torch)
### generate training data -----------------------------------------------------
# input dimensionality (number of input features)
d_in <- 3
# output dimensionality (number of predicted features)
d_out <- 1
# number of observations in training set
n <- 100
# create random data
x <- torch_randn(n, d_in)
y <- x[, 1, NULL] * 0.2 - x[, 2, NULL] * 1.3 - x[, 3, NULL] * 0.5 + torch_randn(n, 1)
### define the network ---------------------------------------------------------
# dimensionality of hidden layer
d_hidden <- 32
model <- nn_sequential(
nn_linear(d_in, d_hidden),
nn_relu(),
nn_linear(d_hidden, d_out)
)
### network parameters ---------------------------------------------------------
# for adam, need to choose a much higher learning rate in this problem
learning_rate <- 0.08
optimizer <- optim_adam(model$parameters, lr = learning_rate)
### training loop --------------------------------------------------------------
for (t in 1:200) {
### -------- Forward pass --------
y_pred <- model(x)
### -------- compute loss --------
loss <- nnf_mse_loss(y_pred, y, reduction = "sum")
if (t %% 10 == 0)
cat("Epoch: ", t, " Loss: ", loss$item(), "\n")
### -------- Backpropagation --------
# Still need to zero out the gradients before the backward pass, only this time,
# on the optimizer object
optimizer$zero_grad()
# gradients are still computed on the loss tensor (no change here)
loss$backward()
### -------- Update weights --------
# use the optimizer to update model parameters
optimizer$step()
}</code></pre>
<p>And that’s it! We’ve seen all the major actors on stage: tensors, <em>autograd</em>, modules, loss functions, and optimizers. In future posts, we’ll explore how to use <em>torch</em> for standard deep learning tasks involving images, text, tabular data, and more. Thanks for reading!</p>
<pre class="r distill-force-highlighting-css"><code></code></pre>
<div id="refs" class="references hanging-indent">
<div id="ref-kingma2017adam">
<p>Kingma, Diederik P., and Jimmy Ba. 2017. “Adam: A Method for Stochastic Optimization.” <a href="http://arxiv.org/abs/1412.6980">http://arxiv.org/abs/1412.6980</a>.</p>
</div>
</div>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>directed acyclic graph<a href="#fnref1" class="footnote-back">↩︎</a></p></li>
<li id="fn2"><p>The prefix <code>nnf_</code> was chosen because in PyTorch, the corresponding functions live in <a href="https://pytorch.org/docs/stable/nn.functional.html">torch.nn.functional</a>.<a href="#fnref2" class="footnote-back">↩︎</a></p></li>
<li id="fn3"><p>This time, the corresponding PyTorch module is <a href="https://pytorch.org/docs/stable/nn.html">torch.nn</a>.<a href="#fnref3" class="footnote-back">↩︎</a></p></li>
</ol>
</div>36d30c81f9e7940880a94bc0db25151bTorchRhttps://blogs.rstudio.com/tensorflow/posts/2020-10-09-torch-optimFri, 09 Oct 2020 00:00:00 +0000Using torch modulesSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-10-07-torch-modules
<p><a href="https://blogs.rstudio.com/ai/posts/2020-10-01-torch-network-from-scratch">Initially</a>, we started learning about <code>torch</code> basics by coding a simple neural network from scratch, making use of just a single of <code>torch</code>’s features: <em>tensors</em>. <a href="https://blogs.rstudio.com/ai/posts/2020-10-05-torch-network-with-autograd">Then</a>, we immensely simplified the task, replacing manual backpropagation with <em>autograd</em>. Today, we <em>modularize</em> the network - in both the habitual and a very literal sense: Low-level matrix operations are swapped out for <code>torch</code> <code>module</code>s.</p>
<h2 id="modules">Modules</h2>
<p>From other frameworks (Keras, say), you may be used to distinguishing between <em>models</em> and <em>layers</em>. In <code>torch</code>, both are instances of <code>nn_Module()</code>, and thus, have some methods in common. For those thinking in terms of “models” and “layers”, I’m artificially splitting up this section into two parts. In reality though, there is no dichotomy: New modules may be composed of existing ones up to arbitrary levels of recursion.</p>
<h3 id="base-modules-layers">Base modules (“layers”)</h3>
<p>Instead of writing out an affine operation by hand – <code>x$mm(w1) + b1</code>, say –, as we’ve been doing so far, we can create a linear module. The following snippet instantiates a linear layer that expects three-feature inputs and returns a single output per observation:</p>
<pre class="r"><code>library(torch)
l <- nn_linear(3, 1)</code></pre>
<p>The module has two parameters, “weight” and “bias”. Both now come pre-initialized:</p>
<pre class="r"><code>l$parameters</code></pre>
<pre><code>$weight
torch_tensor
-0.0385 0.1412 -0.5436
[ CPUFloatType{1,3} ]
$bias
torch_tensor
-0.1950
[ CPUFloatType{1} ]</code></pre>
<p>Modules are callable; calling a module executes its <code>forward()</code> method, which, for a linear layer, matrix-multiplies input and weights, and adds the bias.</p>
<p>Let’s try this:</p>
<pre class="r"><code>data <- torch_randn(10, 3)
out <- l(data)</code></pre>
<p>Unsurprisingly, <code>out</code> now holds some data:</p>
<pre class="r"><code>out$data()</code></pre>
<pre><code>torch_tensor
0.2711
-1.8151
-0.0073
0.1876
-0.0930
0.7498
-0.2332
-0.0428
0.3849
-0.2618
[ CPUFloatType{10,1} ]</code></pre>
<p>In addition though, this tensor knows what will need to be done, should ever it be asked to calculate gradients:</p>
<pre class="r"><code>out$grad_fn</code></pre>
<pre><code>AddmmBackward</code></pre>
<p>Note the difference between tensors returned by modules and self-created ones. When creating tensors ourselves, we need to pass <code>requires_grad = TRUE</code> to trigger gradient calculation. With modules, <code>torch</code> correctly assumes that we’ll want to perform backpropagation at some point.</p>
<p>By now though, we haven’t called <code>backward()</code> yet. Thus, no gradients have yet been computed:</p>
<pre class="r"><code>l$weight$grad
l$bias$grad</code></pre>
<pre><code>torch_tensor
[ Tensor (undefined) ]
torch_tensor
[ Tensor (undefined) ]</code></pre>
<p>Let’s change this:</p>
<pre class="r"><code>out$backward()</code></pre>
<pre><code>Error in (function (self, gradient, keep_graph, create_graph) :
grad can be implicitly created only for scalar outputs (_make_grads at ../torch/csrc/autograd/autograd.cpp:47)</code></pre>
<p>Why the error? <em>Autograd</em> expects the output tensor to be a scalar, while in our example, we have a tensor of size <code>(10, 1)</code>. This error won’t often occur in practice, where we work with <em>batches</em> of inputs (sometimes, just a single batch). But still, it’s interesting to see how to resolve this.</p>
<p>To make the example work, we introduce a – virtual – final aggregation step – taking the mean, say. Let’s call it <code>avg</code>. If such a mean were taken, its gradient with respect to <code>l$weight</code> would be obtained via the chain rule:</p>
<p><span class="math display">\[
\begin{equation*}
\frac{\partial \ avg}{\partial w} = \frac{\partial \ avg}{\partial \ out} \ \frac{\partial \ out}{\partial w}
\end{equation*}
\]</span></p>
<p>Of the quantities on the right side, we’re interested in the second. We need to provide the first one, the way it would look <em>if really we were taking the mean</em>:</p>
<pre class="r"><code>d_avg_d_out <- torch_tensor(10)$`repeat`(10)$unsqueeze(1)$t()
out$backward(gradient = d_avg_d_out)</code></pre>
<p>Now, <code>l$weight$grad</code> and <code>l$bias$grad</code> <em>do</em> contain gradients:</p>
<pre class="r"><code>l$weight$grad
l$bias$grad</code></pre>
<pre><code>torch_tensor
1.3410 6.4343 -30.7135
[ CPUFloatType{1,3} ]
torch_tensor
100
[ CPUFloatType{1} ]</code></pre>
<p>In addition to <code>nn_linear()</code> , <code>torch</code> provides pretty much all the common layers you might hope for. But few tasks are solved by a single layer. How do you combine them? Or, in the usual lingo: How do you build <em>models</em>?</p>
<h3 id="container-modules-models">Container modules (“models”)</h3>
<p>Now, <em>models</em> are just modules that contain other modules. For example, if all inputs are supposed to flow through the same nodes and along the same edges, then <code>nn_sequential()</code> can be used to build a simple graph.</p>
<p>For example:</p>
<pre class="r"><code>model <- nn_sequential(
nn_linear(3, 16),
nn_relu(),
nn_linear(16, 1)
)</code></pre>
<p>We can use the same technique as above to get an overview of all model parameters (two weight matrices and two bias vectors):</p>
<pre class="r"><code>model$parameters</code></pre>
<pre><code>$`0.weight`
torch_tensor
-0.1968 -0.1127 -0.0504
0.0083 0.3125 0.0013
0.4784 -0.2757 0.2535
-0.0898 -0.4706 -0.0733
-0.0654 0.5016 0.0242
0.4855 -0.3980 -0.3434
-0.3609 0.1859 -0.4039
0.2851 0.2809 -0.3114
-0.0542 -0.0754 -0.2252
-0.3175 0.2107 -0.2954
-0.3733 0.3931 0.3466
0.5616 -0.3793 -0.4872
0.0062 0.4168 -0.5580
0.3174 -0.4867 0.0904
-0.0981 -0.0084 0.3580
0.3187 -0.2954 -0.5181
[ CPUFloatType{16,3} ]
$`0.bias`
torch_tensor
-0.3714
0.5603
-0.3791
0.4372
-0.1793
-0.3329
0.5588
0.1370
0.4467
0.2937
0.1436
0.1986
0.4967
0.1554
-0.3219
-0.0266
[ CPUFloatType{16} ]
$`2.weight`
torch_tensor
Columns 1 to 10-0.0908 -0.1786 0.0812 -0.0414 -0.0251 -0.1961 0.2326 0.0943 -0.0246 0.0748
Columns 11 to 16 0.2111 -0.1801 -0.0102 -0.0244 0.1223 -0.1958
[ CPUFloatType{1,16} ]
$`2.bias`
torch_tensor
0.2470
[ CPUFloatType{1} ]</code></pre>
<p>To inspect an individual parameter, make use of its position in the sequential model. For example:</p>
<pre class="r"><code>model[[1]]$bias</code></pre>
<pre><code>torch_tensor
-0.3714
0.5603
-0.3791
0.4372
-0.1793
-0.3329
0.5588
0.1370
0.4467
0.2937
0.1436
0.1986
0.4967
0.1554
-0.3219
-0.0266
[ CPUFloatType{16} ]</code></pre>
<p>And just like <code>nn_linear()</code> above, this module can be called directly on data:</p>
<pre class="r"><code>out <- model(data)</code></pre>
<p>On a composite module like this one, calling <code>backward()</code> will backpropagate through all the layers:</p>
<pre class="r"><code>out$backward(gradient = torch_tensor(10)$`repeat`(10)$unsqueeze(1)$t())
# e.g.
model[[1]]$bias$grad</code></pre>
<pre><code>torch_tensor
0.0000
-17.8578
1.6246
-3.7258
-0.2515
-5.8825
23.2624
8.4903
-2.4604
6.7286
14.7760
-14.4064
-1.0206
-1.7058
0.0000
-9.7897
[ CPUFloatType{16} ]</code></pre>
<p>And placing the composite module on the GPU will move all tensors there:</p>
<pre class="r"><code>model$cuda()
model[[1]]$bias$grad</code></pre>
<pre><code>torch_tensor
0.0000
-17.8578
1.6246
-3.7258
-0.2515
-5.8825
23.2624
8.4903
-2.4604
6.7286
14.7760
-14.4064
-1.0206
-1.7058
0.0000
-9.7897
[ CUDAFloatType{16} ]</code></pre>
<p>Now let’s see how using <code>nn_sequential()</code> can simplify our example network.</p>
<h2 id="simple-network-using-modules">Simple network using modules</h2>
<pre class="r"><code>### generate training data -----------------------------------------------------
# input dimensionality (number of input features)
d_in <- 3
# output dimensionality (number of predicted features)
d_out <- 1
# number of observations in training set
n <- 100
# create random data
x <- torch_randn(n, d_in)
y <- x[, 1, NULL] * 0.2 - x[, 2, NULL] * 1.3 - x[, 3, NULL] * 0.5 + torch_randn(n, 1)
### define the network ---------------------------------------------------------
# dimensionality of hidden layer
d_hidden <- 32
model <- nn_sequential(
nn_linear(d_in, d_hidden),
nn_relu(),
nn_linear(d_hidden, d_out)
)
### network parameters ---------------------------------------------------------
learning_rate <- 1e-4
### training loop --------------------------------------------------------------
for (t in 1:200) {
### -------- Forward pass --------
y_pred <- model(x)
### -------- compute loss --------
loss <- (y_pred - y)$pow(2)$sum()
if (t %% 10 == 0)
cat("Epoch: ", t, " Loss: ", loss$item(), "\n")
### -------- Backpropagation --------
# Zero the gradients before running the backward pass.
model$zero_grad()
# compute gradient of the loss w.r.t. all learnable parameters of the model
loss$backward()
### -------- Update weights --------
# Wrap in with_no_grad() because this is a part we DON'T want to record
# for automatic gradient computation
# Update each parameter by its `grad`
with_no_grad({
model$parameters %>% purrr::walk(function(param) param$sub_(learning_rate * param$grad))
})
}</code></pre>
<p>The forward pass looks a lot better now; however, we still loop through the model’s parameters and update each one by hand. Furthermore, you may be already be suspecting that <code>torch</code> provides abstractions for common loss functions. In the next and last installment of this series, we’ll address both points, making use of <code>torch</code> losses and optimizers. See you then!</p>
<pre class="r distill-force-highlighting-css"><code></code></pre>89db528ec58a4b4da856d3b6eea438b7TorchRhttps://blogs.rstudio.com/tensorflow/posts/2020-10-07-torch-modulesWed, 07 Oct 2020 00:00:00 +0000Introducing torch autogradSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-10-05-torch-network-with-autograd
<p>Last week, we saw how to code <a href="https://blogs.rstudio.com/ai/posts/2020-10-01-torch-network-from-scratch">a simple network from scratch</a>, using nothing but <code>torch</code> <em>tensors</em>. Predictions, loss, gradients, weight updates – all these things we’ve been computing ourselves. Today, we make a significant change: Namely, we spare ourselves the cumbersome calculation of gradients, and have <code>torch</code> do it for us.</p>
<p>Prior to that though, let’s get some background.</p>
<h2 id="automatic-differentiation-with-autograd">Automatic differentiation with <em>autograd</em></h2>
<p><code>torch</code> uses a module called <em>autograd</em> to</p>
<ol style="list-style-type: decimal">
<li><p>record operations performed on tensors, and</p></li>
<li><p>store what will have to be done to obtain the corresponding gradients, once we’re entering the backward pass.</p></li>
</ol>
<p>These prospective actions are stored internally as functions, and when it’s time to compute the gradients, these functions are applied in order: Application starts from the output node, and calculated gradients are successively <em>propagated</em> <em>back</em> through the network. This is a form of <em>reverse mode automatic differentiation</em>.</p>
<h4 id="autograd-basics"><em>Autograd</em> basics</h4>
<p>As users, we can see a bit of the implementation. As a prerequisite for this “recording” to happen, tensors have to be created with <code>requires_grad = TRUE</code>. For example:</p>
<pre class="r"><code>library(torch)
x <- torch_ones(2, 2, requires_grad = TRUE)</code></pre>
<p>To be clear, <code>x</code> now is a tensor <em>with respect to which</em> gradients have to be calculated – normally, a tensor representing a weight or a bias, not the input data <a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a>. If we subsequently perform some operation on that tensor, assigning the result to <code>y</code>,</p>
<pre class="r"><code>y <- x$mean()</code></pre>
<p>we find that <code>y</code> now has a non-empty <code>grad_fn</code> that tells <code>torch</code> how to compute the gradient of <code>y</code> with respect to <code>x</code>:</p>
<pre class="r"><code>y$grad_fn</code></pre>
<pre><code>MeanBackward0</code></pre>
<p>Actual <em>computation</em> of gradients is triggered by calling <code>backward()</code> on the output tensor.</p>
<pre class="r"><code>y$backward()</code></pre>
<p>After <code>backward()</code> has been called, <code>x</code> has a non-null field termed <code>grad</code> that stores the gradient of <code>y</code> with respect to <code>x</code>:</p>
<pre class="r"><code>x$grad</code></pre>
<pre><code>torch_tensor
0.2500 0.2500
0.2500 0.2500
[ CPUFloatType{2,2} ]</code></pre>
<p>With longer chains of computations, we can take a glance at how <code>torch</code> builds up a graph of backward operations. Here is a slightly more complex example – feel free to skip if you’re not the type who just <em>has</em> to peek into things for them to make sense.</p>
<h4 id="digging-deeper">Digging deeper</h4>
<p>We build up a simple graph of tensors, with inputs <code>x1</code> and <code>x2</code> being connected to output <code>out</code> by intermediaries <code>y</code> and <code>z</code>.</p>
<pre class="r"><code>x1 <- torch_ones(2, 2, requires_grad = TRUE)
x2 <- torch_tensor(1.1, requires_grad = TRUE)
y <- x1 * (x2 + 2)
z <- y$pow(2) * 3
out <- z$mean()</code></pre>
<p>To save memory, intermediate gradients are normally not being stored. Calling <code>retain_grad()</code> on a tensor allows one to deviate from this default. Let’s do this here, for the sake of demonstration:</p>
<pre class="r"><code>y$retain_grad()
z$retain_grad()</code></pre>
<p>Now we can go backwards through the graph and inspect <code>torch</code>’s action plan for backprop, starting from <code>out$grad_fn</code>, like so:</p>
<pre class="r"><code># how to compute the gradient for mean, the last operation executed
out$grad_fn</code></pre>
<pre><code>MeanBackward0</code></pre>
<pre class="r"><code># how to compute the gradient for the multiplication by 3 in z = y.pow(2) * 3
out$grad_fn$next_functions</code></pre>
<pre><code>[[1]]
MulBackward1</code></pre>
<pre class="r"><code># how to compute the gradient for pow in z = y.pow(2) * 3
out$grad_fn$next_functions[[1]]$next_functions</code></pre>
<pre><code>[[1]]
PowBackward0</code></pre>
<pre class="r"><code># how to compute the gradient for the multiplication in y = x * (x + 2)
out$grad_fn$next_functions[[1]]$next_functions[[1]]$next_functions</code></pre>
<pre><code>[[1]]
MulBackward0</code></pre>
<pre class="r"><code># how to compute the gradient for the two branches of y = x * (x + 2),
# where the left branch is a leaf node (AccumulateGrad for x1)
out$grad_fn$next_functions[[1]]$next_functions[[1]]$next_functions[[1]]$next_functions</code></pre>
<pre><code>[[1]]
torch::autograd::AccumulateGrad
[[2]]
AddBackward1</code></pre>
<pre class="r"><code># here we arrive at the other leaf node (AccumulateGrad for x2)
out$grad_fn$next_functions[[1]]$next_functions[[1]]$next_functions[[1]]$next_functions[[2]]$next_functions</code></pre>
<pre><code>[[1]]
torch::autograd::AccumulateGrad</code></pre>
<p>If we now call <code>out$backward()</code>, all tensors in the graph will have their respective gradients calculated.</p>
<pre class="r"><code>out$backward()
z$grad
y$grad
x2$grad
x1$grad</code></pre>
<pre><code>torch_tensor
0.2500 0.2500
0.2500 0.2500
[ CPUFloatType{2,2} ]
torch_tensor
4.6500 4.6500
4.6500 4.6500
[ CPUFloatType{2,2} ]
torch_tensor
18.6000
[ CPUFloatType{1} ]
torch_tensor
14.4150 14.4150
14.4150 14.4150
[ CPUFloatType{2,2} ]</code></pre>
<p>After this nerdy excursion, let’s see how <em>autograd</em> makes our network simpler.</p>
<h2 id="the-simple-network-now-using-autograd">The simple network, now using <em>autograd</em></h2>
<p>Thanks to <em>autograd</em>, we say good-bye to the tedious, error-prone process of coding backpropagation ourselves. A single method call does it all: <code>loss$backward()</code>.</p>
<p>With <code>torch</code> keeping track of operations as required, we don’t even have to explicitly name the intermediate tensors any more. We can code forward pass, loss calculation, and backward pass in just three lines:</p>
<pre class="r"><code>y_pred <- x$mm(w1)$add(b1)$clamp(min = 0)$mm(w2)$add(b2)
loss <- (y_pred - y)$pow(2)$sum()
loss$backward()</code></pre>
<p>Here is the complete code. We’re at an intermediate stage: We still manually compute the forward pass and the loss, and we still manually update the weights. Due to the latter, there is something I need to explain. But I’ll let you check out the new version first:</p>
<pre class="r"><code>library(torch)
### generate training data -----------------------------------------------------
# input dimensionality (number of input features)
d_in <- 3
# output dimensionality (number of predicted features)
d_out <- 1
# number of observations in training set
n <- 100
# create random data
x <- torch_randn(n, d_in)
y <- x[, 1, NULL] * 0.2 - x[, 2, NULL] * 1.3 - x[, 3, NULL] * 0.5 + torch_randn(n, 1)
### initialize weights ---------------------------------------------------------
# dimensionality of hidden layer
d_hidden <- 32
# weights connecting input to hidden layer
w1 <- torch_randn(d_in, d_hidden, requires_grad = TRUE)
# weights connecting hidden to output layer
w2 <- torch_randn(d_hidden, d_out, requires_grad = TRUE)
# hidden layer bias
b1 <- torch_zeros(1, d_hidden, requires_grad = TRUE)
# output layer bias
b2 <- torch_zeros(1, d_out, requires_grad = TRUE)
### network parameters ---------------------------------------------------------
learning_rate <- 1e-4
### training loop --------------------------------------------------------------
for (t in 1:200) {
### -------- Forward pass --------
y_pred <- x$mm(w1)$add(b1)$clamp(min = 0)$mm(w2)$add(b2)
### -------- compute loss --------
loss <- (y_pred - y)$pow(2)$sum()
if (t %% 10 == 0)
cat("Epoch: ", t, " Loss: ", loss$item(), "\n")
### -------- Backpropagation --------
# compute gradient of loss w.r.t. all tensors with requires_grad = TRUE
loss$backward()
### -------- Update weights --------
# Wrap in with_no_grad() because this is a part we DON'T
# want to record for automatic gradient computation
with_no_grad({
w1 <- w1$sub_(learning_rate * w1$grad)
w2 <- w2$sub_(learning_rate * w2$grad)
b1 <- b1$sub_(learning_rate * b1$grad)
b2 <- b2$sub_(learning_rate * b2$grad)
# Zero gradients after every pass, as they'd accumulate otherwise
w1$grad$zero_()
w2$grad$zero_()
b1$grad$zero_()
b2$grad$zero_()
})
}</code></pre>
<p>As explained above, after <code>some_tensor$backward()</code>, all tensors preceding it in the graph<a href="#fn2" class="footnote-ref" id="fnref2"><sup>2</sup></a> will have their <code>grad</code> fields populated. We make use of these fields to update the weights. But now that <em>autograd</em> is “on”, whenever we execute an operation we <em>don’t</em> want recorded for backprop, we need to explicitly exempt it: This is why we wrap the weight updates in a call to <code>with_no_grad()</code>.</p>
<p>While this is something you may file under “nice to know” – after all, once we arrive at the last post in the series, this manual updating of weights will be gone – the idiom of <em>zeroing gradients</em> is here to stay: Values stored in <code>grad</code> fields accumulate; whenever we’re done using them, we need to zero them out before reuse.</p>
<h2 id="outlook">Outlook</h2>
<p>So where do we stand? We started out coding a network completely from scratch, making use of nothing but <code>torch</code> tensors. Today, we got significant help from <em>autograd</em>.</p>
<p>But we’re still manually updating the weights, – and aren’t deep learning frameworks known to provide abstractions (“layers”, or: “modules”) on top of tensor computations …?</p>
<p>We address both issues in the follow-up installments. Thanks for reading!</p>
<pre class="r distill-force-highlighting-css"><code></code></pre>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Unless we <em>want</em> to change the data, as when generating adversarial examples.<a href="#fnref1" class="footnote-back">↩︎</a></p></li>
<li id="fn2"><p>All that have <code>requires_grad</code> set to <code>TRUE</code>, to be precise.<a href="#fnref2" class="footnote-back">↩︎</a></p></li>
</ol>
</div>68b4457f4594ac1a742e88e6d9c790b4TorchRhttps://blogs.rstudio.com/tensorflow/posts/2020-10-05-torch-network-with-autogradMon, 05 Oct 2020 00:00:00 +0000Getting familiar with torch tensorsSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-10-01-torch-network-from-scratch
In this first installment of a four-part miniseries, we present the main things you will want to know about torch tensors. As an illustrative example, we'll code a simple neural network from scratch.TorchRhttps://blogs.rstudio.com/tensorflow/posts/2020-10-01-torch-network-from-scratchThu, 01 Oct 2020 00:00:00 +0000sparklyr 1.4: Weighted Sampling, Tidyr Verbs, Robust Scaler, RAPIDS, and moreYitao Li
https://blogs.rstudio.com/tensorflow/posts/2020-09-30-sparklyr-1.4.0-released
Sparklyr 1.4 is now available! This release comes with delightful new features such as weighted sampling and tidyr verbs support for Spark dataframes, robust scaler for standardizing data based on median and interquartile range, spark_connect interface for RAPIDS GPU acceleration plugin, as well as a number of dplyr-related improvements.RPackages/ReleasesDistributed Computinghttps://blogs.rstudio.com/tensorflow/posts/2020-09-30-sparklyr-1.4.0-releasedWed, 30 Sep 2020 00:00:00 +0000Please allow me to introduce myself: Torch for RSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-09-29-introducing-torch-for-r
Today, we are excited to introduce torch, an R package that allows you to use PyTorch-like functionality natively from R. No Python installation is required: torch is built directly on top of libtorch, a C++ library that provides the tensor-computation and automatic-differentiation capabilities essential to building neural networks.Packages/ReleasesTorchRhttps://blogs.rstudio.com/tensorflow/posts/2020-09-29-introducing-torch-for-rTue, 29 Sep 2020 00:00:00 +0000Introducing sparklyr.flint: A time-series extension for sparklyrYitao Li
https://blogs.rstudio.com/tensorflow/posts/2020-09-07-sparklyr-flint
We are pleased to announce that sparklyr.flint, a sparklyr extension for analyzing time series at scale with Flint, is now available on CRAN. Flint is an open-source library for working with time-series in Apache Spark which supports aggregates and joins on time-series datasets.RTime Serieshttps://blogs.rstudio.com/tensorflow/posts/2020-09-07-sparklyr-flintMon, 07 Sep 2020 00:00:00 +0000An introduction to weather forecasting with deep learningSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-09-01-weather-prediction
A few weeks ago, we showed how to forecast chaotic dynamical systems with deep learning, augmented by a custom constraint derived from domain-specific insight. Global weather is a chaotic system, but of much higher complexity than many tasks commonly addressed with machine and/or deep learning. In this post, we provide a practical introduction featuring a simple deep learning baseline for atmospheric forecasting. While far away from being competitive, it serves to illustrate how more sophisticated and compute-intensive models may approach that formidable task by means of methods situated on the "black-box end" of the continuum.RTensorFlow/KerasTime Serieshttps://blogs.rstudio.com/tensorflow/posts/2020-09-01-weather-predictionTue, 01 Sep 2020 00:00:00 +0000Training ImageNet with RJavier Luraschi
https://blogs.rstudio.com/tensorflow/posts/2020-08-24-training-imagenet-with-r
This post explores how to train large datasets with TensorFlow and R. Specifically, we present how to download and repartition ImageNet, followed by training ImageNet across multiple GPUs in distributed environments using TensorFlow and Apache Spark.RTensorFlow/KerasDistributed ComputingData Managementhttps://blogs.rstudio.com/tensorflow/posts/2020-08-24-training-imagenet-with-rMon, 24 Aug 2020 00:00:00 +0000Deepfake detection challenge from RTurgut Abdullayev
https://blogs.rstudio.com/tensorflow/posts/2020-08-18-deepfake
A couple of months ago, Amazon, Facebook, Microsoft, and other contributors initiated a challenge consisting of telling apart real and AI-generated ("fake") videos. We show how to approach this challenge from R.Image Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2020-08-18-deepfakeTue, 18 Aug 2020 00:00:00 +0000FNN-VAE for noisy time series forecastingSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-07-31-fnn-vae-for-noisy-timeseries
In the last part of this mini-series on forecasting with false nearest neighbors (FNN) loss, we replace the LSTM autoencoder from the previous post by a convolutional VAE, resulting in equivalent prediction performance but significantly lower training time. In addition, we find that FNN regularization is of great help when an underlying deterministic process is obscured by substantial noise.RTensorFlow/KerasTime SeriesUnsupervised Learninghttps://blogs.rstudio.com/tensorflow/posts/2020-07-31-fnn-vae-for-noisy-timeseriesFri, 31 Jul 2020 00:00:00 +0000State-of-the-art NLP models from RTurgut Abdullayev
https://blogs.rstudio.com/tensorflow/posts/2020-07-30-state-of-the-art-nlp-models-from-r
Nowadays, Microsoft, Google, Facebook, and OpenAI are sharing lots of state-of-the-art models in the field of Natural Language Processing. However, fewer materials exist how to use these models from R. In this post, we will show how R users can access and benefit from these models as well.Natural Language Processinghttps://blogs.rstudio.com/tensorflow/posts/2020-07-30-state-of-the-art-nlp-models-from-rThu, 30 Jul 2020 00:00:00 +0000Parallelized sampling using exponential variatesYitao Li
https://blogs.rstudio.com/tensorflow/posts/2020-07-29-parallelized-sampling
How can the seemingly iterative process of weighted sampling without replacement be transformed into something highly parallelizable? Turns out a well-known technique based on exponential variates accomplishes exactly that.ConceptsDistributed Computinghttps://blogs.rstudio.com/tensorflow/posts/2020-07-29-parallelized-samplingWed, 29 Jul 2020 00:00:00 +0000Time series prediction with FNN-LSTMSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-07-20-fnn-lstm
In a recent post, we showed how an LSTM autoencoder, regularized by false nearest neighbors (FNN) loss, can be used to reconstruct the attractor of a nonlinear, chaotic dynamical system. Here, we explore how that same technique assists in prediction. Matched up with a comparable, capacity-wise, "vanilla LSTM", FNN-LSTM improves performance on a set of very different, real-world datasets, especially for the initial steps in a multi-step forecast.RTensorFlow/KerasTime SeriesUnsupervised Learninghttps://blogs.rstudio.com/tensorflow/posts/2020-07-20-fnn-lstmMon, 20 Jul 2020 00:00:00 +0000sparklyr 1.3: Higher-order Functions, Avro and Custom SerializersYitao Li
https://blogs.rstudio.com/tensorflow/posts/2020-07-16-sparklyr-1.3.0-released
Sparklyr 1.3 is now available, featuring exciting new functionalities such as integration of Spark higher-order functions and data import/export in Avro and in user-defined serialization formats.Packages/ReleasesDistributed Computinghttps://blogs.rstudio.com/tensorflow/posts/2020-07-16-sparklyr-1.3.0-releasedThu, 16 Jul 2020 00:00:00 +0000Deep attractors: Where deep learning meets chaosSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-06-24-deep-attractors
In nonlinear dynamics, when the state space is thought to be multidimensional but all we have for data is just a univariate time series, one may attempt to reconstruct the true space via delay coordinate embeddings. However, it is not clear a priori how to choose dimensionality and time lag of the reconstruction space. In this post, we show how to use an autoencoder architecture to circumvent the problem: Given just a scalar series of observations, the autoencoder directly learns to represent attractors of chaotic systems in adequate dimensionality.RTensorFlow/KerasTime SeriesUnsupervised Learninghttps://blogs.rstudio.com/tensorflow/posts/2020-06-24-deep-attractorsWed, 24 Jun 2020 00:00:00 +0000Easy PixelCNN with tfprobabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-05-29-pixelcnn
PixelCNN is a deep learning architecture - or bundle of architectures - designed to generate highly realistic-looking images. To use it, no reverse-engineering of arXiv papers or search for reference implementations is required: TensorFlow Probability and its R wrapper, tfprobability, now include a PixelCNN distribution that can be used to train a straightforwardly-defined neural network in a parameterizable way.RImage Recognition & Image ProcessingTensorFlow/KerasProbabilistic ML/DLUnsupervised Learninghttps://blogs.rstudio.com/tensorflow/posts/2020-05-29-pixelcnnFri, 29 May 2020 00:00:00 +0000Hacking deep learning: model inversion attack by exampleSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-05-15-model-inversion-attacks
Compared to other applications, deep learning models might not seem too likely as victims of privacy attacks. However, methods exist to determine whether an entity was used in the training set (an adversarial attack called member inference), and techniques subsumed under "model inversion" allow to reconstruct raw data input given just model output (and sometimes, context information). This post shows an end-to-end example of model inversion, and explores mitigation strategies using TensorFlow Privacy.RPrivacy & SecurityTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2020-05-15-model-inversion-attacksFri, 15 May 2020 00:00:00 +0000Towards privacy: Encrypted deep learning with Syft and KerasSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-04-29-encrypted_keras_with_syft
Deep learning need not be irreconcilable with privacy protection. Federated learning enables on-device, distributed model training; encryption keeps model and gradient updates private; differential privacy prevents the training data from leaking. As of today, private and secure deep learning is an emerging technology. In this post, we introduce Syft, an open-source framework that integrates with PyTorch as well as TensorFlow. In an example use case, we obtain private predictions from a Keras model.RPrivacy & SecurityTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2020-04-29-encrypted_keras_with_syftWed, 29 Apr 2020 00:00:00 +0000sparklyr 1.2: Foreach, Spark 3.0 and Databricks ConnectYitao Li
https://blogs.rstudio.com/tensorflow/posts/2020-04-21-sparklyr-1.2.0-released
A new sparklyr release is now available. This sparklyr 1.2 release features new functionalities such as support for Databricks Connect, a Spark backend for the 'foreach' package, inter-op improvements for working with Spark 3.0 preview, as well as a number of bug fixes and improvements addressing user-visible pain points.RPackages/ReleasesDistributed Computinghttps://blogs.rstudio.com/tensorflow/posts/2020-04-21-sparklyr-1.2.0-releasedTue, 21 Apr 2020 00:00:00 +0000pins 0.4: VersioningJavier Luraschi
https://blogs.rstudio.com/tensorflow/posts/2020-04-13-pins-04
A new release of pins is available on CRAN today. This release adds support to time travel across dataset versions, which improves collaboration and protects your code from breaking when remote resources change unexpectedly.RPackages/ReleasesData Managementhttps://blogs.rstudio.com/tensorflow/posts/2020-04-13-pins-04Mon, 13 Apr 2020 00:00:00 +0000A first look at federated learning with TensorFlowSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-04-08-tf-federated-intro
The term "federated learning" was coined to describe a form of distributed model training where the data remains on client devices, i.e., is never shipped to the coordinating server. In this post, we introduce central concepts and run first experiments with TensorFlow Federated, using R.Privacy & SecurityTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2020-04-08-tf-federated-introWed, 08 Apr 2020 00:00:00 +0000Introducing: The RStudio AI BlogThe Multiverse Team
https://blogs.rstudio.com/tensorflow/posts/2020-04-01-rstudio-ai-blog
This blog just got a new title: RStudio AI Blog. We explain why.Metahttps://blogs.rstudio.com/tensorflow/posts/2020-04-01-rstudio-ai-blogMon, 30 Mar 2020 00:00:00 +0000Infinite surprise - the iridescent personality of Kullback-Leibler divergenceSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-02-19-kl-divergence
Kullback-Leibler divergence is not just used to train variational autoencoders or Bayesian networks (and not just a hard-to-pronounce thing). It is a fundamental concept in information theory, put to use in a vast range of applications. Most interestingly, it's not always about constraint, regularization or compression. Quite on the contrary, sometimes it is about novelty, discovery and surprise.Probabilistic ML/DLConceptshttps://blogs.rstudio.com/tensorflow/posts/2020-02-19-kl-divergenceWed, 19 Feb 2020 00:00:00 +0000NumPy-style broadcasting for R TensorFlow usersSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-01-24-numpy-broadcasting
Broadcasting, as done by Python's scientific computing library NumPy, involves dynamically extending shapes so that arrays of different sizes may be passed to operations that expect conformity - such as adding or multiplying elementwise. In NumPy, the way broadcasting works is specified exactly; the same rules apply to TensorFlow operations. For anyone who finds herself, occasionally, consulting Python code, this post strives to explain.TensorFlow/KerasConceptshttps://blogs.rstudio.com/tensorflow/posts/2020-01-24-numpy-broadcastingFri, 24 Jan 2020 00:00:00 +0000First experiments with TensorFlow mixed-precision trainingSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2020-01-13-mixed-precision-training
TensorFlow 2.1, released last week, allows for mixed-precision training, making use of the Tensor Cores available in the most recent NVidia GPUs. In this post, we report first experimental results and provide some background on what this is all about.TensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2020-01-13-mixed-precision-trainingMon, 13 Jan 2020 00:00:00 +0000Differential Privacy with TensorFlowSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-12-20-differential-privacy
Differential Privacy guarantees that results of a database query are basically independent of the presence in the data of a single individual. Applied to machine learning, we expect that no single training example influences the parameters of the trained model in a substantial way. This post introduces TensorFlow Privacy, a library built on top of TensorFlow, that can be used to train differentially private deep learning models from R.Privacy & SecurityTensorFlow/KerasTime Serieshttps://blogs.rstudio.com/tensorflow/posts/2019-12-20-differential-privacyFri, 20 Dec 2019 00:00:00 +0000tfhub: R interface to TensorFlow HubDaniel Falbel
https://blogs.rstudio.com/tensorflow/posts/2019-12-18-tfhub-0.7.0
TensorFlow Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a TensorFlow graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning.TensorFlow/KerasPackages/Releaseshttps://blogs.rstudio.com/tensorflow/posts/2019-12-18-tfhub-0.7.0Wed, 18 Dec 2019 00:00:00 +0000Gaussian Process Regression with tfprobabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-12-10-variational-gaussian-process
Continuing our tour of applications of TensorFlow Probability (TFP), after Bayesian Neural Networks, Hamiltonian Monte Carlo and State Space Models, here we show an example of Gaussian Process Regression. In fact, what we see is a rather "normal" Keras network, defined and trained in pretty much the usual way, with TFP's Variational Gaussian Process layer pulling off all the magic.Probabilistic ML/DLTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-12-10-variational-gaussian-processTue, 10 Dec 2019 00:00:00 +0000Getting started with Keras from R - the 2020 editionSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-11-27-gettingstarted-2020
Looking for materials to get started with deep learning from R? This post presents useful tutorials, guides, and background documentation on the new TensorFlow for R website. Advanced users will find pointers to applications of new release 2.0 (or upcoming 2.1!) features alluded to in the recent TensorFlow 2.0 post.Packages/ReleasesTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-11-27-gettingstarted-2020Wed, 27 Nov 2019 00:00:00 +0000Variational convnets with tfprobabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-11-13-variational-convnet
In a Bayesian neural network, layer weights are distributions, not tensors. Using tfprobability, the R wrapper to TensorFlow Probability, we can build regular Keras models that have probabilistic layers, and thus get uncertainty estimates "for free". In this post, we show how to define, train and obtain predictions from a probabilistic convolutional neural network.Probabilistic ML/DLTime SeriesTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-11-13-variational-convnetWed, 13 Nov 2019 00:00:00 +0000tfprobability 0.8 on CRAN: Now how can you use it?Sigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-11-07-tfp-cran
Part of the r-tensorflow ecosystem, tfprobability is an R wrapper to TensorFlow Probability, the Python probabilistic programming framework developed by Google. We take the occasion of tfprobability's acceptance on CRAN to give a high-level introduction, highlighting interesting use cases and applications.Probabilistic ML/DLPackages/ReleasesTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-11-07-tfp-cranThu, 07 Nov 2019 00:00:00 +0000Innocent unicorns considered harmful? How to experiment with GPT-2 from RSigrid KeydanaJavier Luraschi
https://blogs.rstudio.com/tensorflow/posts/2019-10-23-gpt-2
Is society ready to deal with challenges brought about by artificially-generated information - fake images, fake videos, fake text? While this post won't answer that question, it should help form an opinion on the threat exerted by fake text as of this writing, autumn 2019. We introduce gpt2, an R package that wraps OpenAI's public implementation of GPT-2, the language model that early this year surprised the NLP community with the unprecedented quality of its creations.Natural Language ProcessingPackages/Releaseshttps://blogs.rstudio.com/tensorflow/posts/2019-10-23-gpt-2Wed, 23 Oct 2019 00:00:00 +0000TensorFlow 2.0 is here - what changes for R users?Sigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-10-08-tf2-whatchanges
TensorFlow 2.0 was finally released last week. As R users we have two kinds of questions. First, will my keras code still run? And second, what is it that changes? In this post, we answer both and, then, give a tour of exciting new developments in the r-tensorflow ecosystem.TensorFlow/KerasPackages/Releaseshttps://blogs.rstudio.com/tensorflow/posts/2019-10-08-tf2-whatchangesTue, 08 Oct 2019 00:00:00 +0000On leapfrogs, crashing satellites, and going nuts: A very first conceptual introduction to Hamiltonian Monte CarloSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-10-03-intro-to-hmc
TensorFlow Probability, and its R wrapper tfprobability, provide Markov Chain Monte Carlo (MCMC) methods that were used in a number of recent posts on this blog. These posts were directed to users already comfortable with the method, and terminology, per se, which readers mainly interested in deep learning won't necessarily be. Here we try to make up leeway, introducing Hamitonian Monte Carlo (HMC) as well as a few often-heard "buzzwords" accompanying it, always striving to keep in mind what it is all "for".Bayesian ModelingConceptshttps://blogs.rstudio.com/tensorflow/posts/2019-10-03-intro-to-hmcThu, 03 Oct 2019 00:00:00 +0000BERT from RTurgut Abdullayev
https://blogs.rstudio.com/tensorflow/posts/2019-09-30-bert-r
A deep learning model - BERT from Google AI Research - has yielded state-of-the-art results in a wide variety of Natural Language Processing (NLP) tasks. In this tutorial, we will show how to load and train the BERT model from R, using Keras.Natural Language ProcessingTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-09-30-bert-rMon, 30 Sep 2019 00:00:00 +0000So, how come we can use TensorFlow from R?Sigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-08-29-using-tf-from-r
Have you ever wondered why you can call TensorFlow - mostly known as a Python framework - from R? If not - that's how it should be, as the R packages keras and tensorflow aim to make this process as transparent as possible to the user. But for them to be those helpful genies, someone else first has to tame the Python.TensorFlow/KerasMetaConceptshttps://blogs.rstudio.com/tensorflow/posts/2019-08-29-using-tf-from-rThu, 29 Aug 2019 00:00:00 +0000Image segmentation with U-NetDaniel FalbelSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-08-23-unet
In image segmentation, every pixel of an image is assigned a class. Depending on the application, classes could be different cell types; or the task could be binary, as in "cancer cell yes or no?". Area of application notwithstanding, the established neural network architecture of choice is U-Net. In this post, we show how to preprocess data and train a U-Net model on the Kaggle Carvana image segmentation data.Image Recognition & Image ProcessingTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-08-23-unetFri, 23 Aug 2019 00:00:00 +0000Modeling censored data with tfprobabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-07-31-censored-data
In this post we use tfprobability, the R interface to TensorFlow Probability, to model censored data. Again, the exposition is inspired by the treatment of this topic in Richard McElreath's Statistical Rethinking. Instead of cute cats though, we model immaterial entities from the cold world of technology: This post explores durations of CRAN package checks, a dataset that comes with Max Kuhn's parsnip.Bayesian ModelingTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-07-31-censored-dataWed, 31 Jul 2019 00:00:00 +0000TensorFlow feature columns: Transforming your data recipes-styleDaniel FalbelSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-07-09-feature-columns
TensorFlow feature columns provide useful functionality for preprocessing categorical data and chaining transformations, like bucketization or feature crossing. From R, we use them in popular "recipes" style, creating and subsequently refining a feature specification. In this post, we show how using feature specs frees cognitive resources and lets you focus on what you really want to accomplish. What's more, because of its elegance, feature-spec code reads nice and is fun to write as well.TensorFlow/KerasTabular Datahttps://blogs.rstudio.com/tensorflow/posts/2019-07-09-feature-columnsTue, 09 Jul 2019 00:00:00 +0000Dynamic linear models with tfprobabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-06-25-dynamic_linear_models_tfprobability
Previous posts featuring tfprobability - the R interface to TensorFlow Probability - have focused on enhancements to deep neural networks (e.g., introducing Bayesian uncertainty estimates) and fitting hierarchical models with Hamiltonian Monte Carlo. This time, we show how to fit time series using dynamic linear models (DLMs), yielding posterior predictive forecasts as well as the smoothed and filtered estimates from the Kálmán filter.Probabilistic ML/DLTime Serieshttps://blogs.rstudio.com/tensorflow/posts/2019-06-25-dynamic_linear_models_tfprobabilityMon, 24 Jun 2019 00:00:00 +0000Adding uncertainty estimates to Keras models with tfprobabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-06-05-uncertainty-estimates-tfprobability
As of today, there is no mainstream road to obtaining uncertainty estimates from neural networks. All that can be said is that, normally, approaches tend to be Bayesian in spirit, involving some way of putting a prior over model weights. This holds true as well for the method presented in this post: We show how to use tfprobability, the R interface to TensorFlow Probability, to add uncertainty estimates to a Keras model in an elegant and conceptually plausible way.Probabilistic ML/DLTensorFlow/KerasConceptshttps://blogs.rstudio.com/tensorflow/posts/2019-06-05-uncertainty-estimates-tfprobabilityWed, 05 Jun 2019 00:00:00 +0000Hierarchical partial pooling, continued: Varying slopes models with TensorFlow ProbabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-05-24-varying-slopes
This post builds on our recent introduction to multi-level modeling with tfprobability, the R wrapper to TensorFlow Probability. We show how to pool not just mean values ("intercepts"), but also relationships ("slopes"), thus enabling models to learn from data in an even broader way. Again, we use an example from Richard McElreath's "Statistical Rethinking"; the terminology as well as the way we present this topic are largely owed to this book.Bayesian ModelingTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-05-24-varying-slopesFri, 24 May 2019 00:00:00 +0000Tadpoles on TensorFlow: Hierarchical partial pooling with tfprobabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-05-06-tadpoles-on-tensorflow
This post is a first introduction to MCMC modeling with tfprobability, the R interface to TensorFlow Probability (TFP). Our example is a multi-level model describing tadpole mortality, which may be known to the reader from Richard McElreath's wonderful "Statistical Rethinking".Bayesian ModelingTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-05-06-tadpoles-on-tensorflowMon, 06 May 2019 00:00:00 +0000Experimenting with autoregressive flows in TensorFlow ProbabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-04-24-autoregressive-flows
Continuing from the recent introduction to bijectors in TensorFlow Probability (TFP), this post brings autoregressivity to the table. Using TFP through the new R package tfprobability, we look at the implementation of masked autoregressive flows (MAF) and put them to use on two different datasets.Probabilistic ML/DLUnsupervised LearningTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2019-04-24-autoregressive-flowsWed, 24 Apr 2019 00:00:00 +0000Auto-Keras: Tuning-free deep learning from RJuan Cruz Rodriguez
https://blogs.rstudio.com/tensorflow/posts/2019-04-16-autokeras
Sometimes in deep learning, architecture design and hyperparameter tuning pose substantial challenges. Using Auto-Keras, none of these is needed: We start a search procedure and extract the best-performing model. This post presents Auto-Keras in action on the well-known MNIST dataset.TensorFlow/KerasPackages/Releaseshttps://blogs.rstudio.com/tensorflow/posts/2019-04-16-autokerasTue, 16 Apr 2019 00:00:00 +0000Getting into the flow: Bijectors in TensorFlow ProbabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-04-05-bijectors-flows
Normalizing flows are one of the lesser known, yet fascinating and successful architectures in unsupervised deep learning. In this post we provide a basic introduction to flows using tfprobability, an R wrapper to TensorFlow Probability. Upcoming posts will build on this, using more complex flows on more complex data.Probabilistic ML/DLTensorFlow/KerasConceptsUnsupervised Learninghttps://blogs.rstudio.com/tensorflow/posts/2019-04-05-bijectors-flowsFri, 05 Apr 2019 00:00:00 +0000Math, code, concepts: A third road to deep learningSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-03-15-concepts-way-to-dl
Not everybody who wants to get into deep learning has a strong background in math or programming. This post elaborates on a concepts-driven, abstraction-based way to learn what it's all about.MetaConceptshttps://blogs.rstudio.com/tensorflow/posts/2019-03-15-concepts-way-to-dlFri, 15 Mar 2019 00:00:00 +0000Audio classification with Keras: Looking closer at the non-deep learning partsSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-02-07-audio-background
Sometimes, deep learning is seen - and welcomed - as a way to avoid laborious preprocessing of data. However, there are cases where preprocessing of sorts does not only help improve prediction, but constitutes a fascinating topic in itself. One such case is audio classification. In this post, we build on a previous post on this blog, this time focusing on explaining some of the non-deep learning background. We then link the concepts explained to updated for near-future releases TensorFlow code.TensorFlow/KerasConceptsAudio Processinghttps://blogs.rstudio.com/tensorflow/posts/2019-02-07-audio-backgroundThu, 07 Feb 2019 00:00:00 +0000Discrete Representation Learning with VQ-VAE and TensorFlow ProbabilitySigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-01-24-vq-vae
Mostly when thinking of Variational Autoencoders (VAEs), we picture the prior as an isotropic Gaussian. But this is by no means a necessity. The Vector Quantised Variational Autoencoder (VQ-VAE) described in van den Oord et al's "Neural Discrete Representation Learning" features a discrete latent space that allows to learn impressively concise latent representations. In this post, we combine elements of Keras, TensorFlow, and TensorFlow Probability to see if we can generate convincing letters resembling those in Kuzushiji-MNIST.TensorFlow/KerasProbabilistic ML/DLUnsupervised Learninghttps://blogs.rstudio.com/tensorflow/posts/2019-01-24-vq-vaeThu, 24 Jan 2019 00:00:00 +0000Getting started with TensorFlow Probability from RSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2019-01-08-getting-started-with-tf-probability
TensorFlow Probability offers a vast range of functionality ranging from distributions over probabilistic network layers to probabilistic inference. It works seamlessly with core TensorFlow and (TensorFlow) Keras. In this post, we provide a short introduction to the distributions layer and then, use it for sampling and calculating probabilities in a Variational Autoencoder.TensorFlow/KerasProbabilistic ML/DLUnsupervised Learninghttps://blogs.rstudio.com/tensorflow/posts/2019-01-08-getting-started-with-tf-probabilityTue, 08 Jan 2019 00:00:00 +0000Concepts in object detectionSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-12-18-object-detection-concepts
As shown in a previous post, naming and locating a single object in an image is a task that may be approached in a straightforward way. This is not the same with general object detection, though - naming and locating several objects at once, with no prior information about how many objects are supposed to be detected.
In this post, we explain the steps involved in coding a basic single-shot object detector: Not unlike SSD (Single-shot Multibox Detector), but simplified and designed not for best performance, but comprehensibility.TensorFlow/KerasImage Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-12-18-object-detection-conceptsTue, 18 Dec 2018 00:00:00 +0000Entity embeddings for fun and profitSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-11-26-embeddings-fun-and-profit
Embedding layers are not just useful when working with language data. As "entity embeddings", they've recently become famous for applications on tabular, small-scale data. In this post, we exemplify two possible use cases, also drawing attention to what not to expect.TensorFlow/KerasTabular Datahttps://blogs.rstudio.com/tensorflow/posts/2018-11-26-embeddings-fun-and-profitMon, 26 Nov 2018 00:00:00 +0000You sure? A Bayesian approach to obtaining uncertainty estimates from neural networksSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-11-12-uncertainty_estimates_dropout
In deep learning, there is no obvious way of obtaining uncertainty estimates. In 2016, Gal and Ghahramani proposed a method that is both theoretically grounded and practical: use dropout at test time. In this post, we introduce a refined version of this method (Gal et al. 2017) that has the network itself learn how uncertain it is.Image Recognition & Image ProcessingProbabilistic ML/DLTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2018-11-12-uncertainty_estimates_dropoutMon, 12 Nov 2018 00:00:00 +0000Naming and locating objects in imagesSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-11-05-naming-locating-objects
Object detection (the act of classifying and localizing multiple objects in a scene) is one of the more difficult, but very relevant in practice deep learning tasks. We'll build up to it in several posts. Here we start with the simpler tasks of naming and locating a single object.TensorFlow/KerasImage Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-11-05-naming-locating-objectsMon, 05 Nov 2018 00:00:00 +0000Representation learning with MMD-VAESigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-10-22-mmd-vae
Like GANs, variational autoencoders (VAEs) are often used to generate images. However, VAEs add an additional promise: namely, to model an underlying latent space. Here, we first look at a typical implementation that maximizes the evidence lower bound. Then, we compare it to one of the more recent competitors, MMD-VAE, from the Info-VAE (information maximizing VAE) family.TensorFlow/KerasUnsupervised LearningImage Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-10-22-mmd-vaeMon, 22 Oct 2018 00:00:00 +0000Winner takes all: A look at activations and cost functionsSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-10-11-activations-intro
Why do we use the activations we use, and how do they relate to the cost functions they tend to co-appear with? In this post we provide a conceptual introduction.TensorFlow/KerasConceptshttps://blogs.rstudio.com/tensorflow/posts/2018-10-11-activations-introThu, 11 Oct 2018 00:00:00 +0000More flexible models with TensorFlow eager execution and KerasSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-10-02-eager-wrapup
Advanced applications like generative adversarial networks, neural style transfer, and the attention mechanism ubiquitous in natural language processing used to be not-so-simple to implement with the Keras declarative coding paradigm. Now, with the advent of TensorFlow eager execution, things have changed. This post explores using eager execution with R.TensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2018-10-02-eager-wrapupTue, 02 Oct 2018 00:00:00 +0000Collaborative filtering with embeddingsSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-09-26-embeddings-recommender
Embeddings are not just for use in natural language processing. Here we apply embeddings to a common task in collaborative filtering - predicting user ratings - and on our way, strive for a better understanding of what an embedding layer really does.TensorFlow/KerasTabular Datahttps://blogs.rstudio.com/tensorflow/posts/2018-09-26-embeddings-recommenderWed, 26 Sep 2018 00:00:00 +0000Image-to-image translation with pix2pixSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-09-20-eager-pix2pix
Conditional GANs (cGANs) may be used to generate one type of object based on another - e.g., a map based on a photo, or a color video based on black-and-white. Here, we show how to implement the pix2pix approach with Keras and eager execution.TensorFlow/KerasImage Recognition & Image ProcessingUnsupervised Learninghttps://blogs.rstudio.com/tensorflow/posts/2018-09-20-eager-pix2pixThu, 20 Sep 2018 00:00:00 +0000Attention-based Image Captioning with KerasSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-09-17-eager-captioning
Image captioning is a challenging task at intersection of vision and language. Here, we demonstrate using Keras and eager execution to incorporate an attention mechanism that allows the network to concentrate on image features relevant to the current state of text generation.Natural Language ProcessingTensorFlow/KerasImage Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-09-17-eager-captioningMon, 17 Sep 2018 00:00:00 +0000Neural style transfer with eager execution and KerasSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-09-10-eager-style-transfer
Continuing our series on combining Keras with TensorFlow eager execution, we show how to implement neural style transfer in a straightforward way. Based on this easy-to-adapt example, you can easily perform style transfer on your own images.TensorFlow/KerasUnsupervised LearningImage Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-09-10-eager-style-transferMon, 10 Sep 2018 00:00:00 +0000Getting started with deep learning in RSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-09-07-getting-started
Many fields are benefiting from the use of deep learning, and with the R keras, tensorflow and related packages, you can now easily do state of the art deep learning in R. In this post, we want to give some orientation as to how to best get started.TensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2018-09-07-getting-startedFri, 07 Sep 2018 00:00:00 +0000Generating images with Keras and TensorFlow eager executionSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-08-26-eager-dcgan
Generative adversarial networks (GANs) are a popular deep learning approach to generating new entities (often but not always images). We show how to code them using Keras and TensorFlow eager execution.TensorFlow/KerasUnsupervised LearningImage Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-08-26-eager-dcganSun, 26 Aug 2018 00:00:00 +0000Attention-based Neural Machine Translation with KerasSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-07-30-attention-layer
As sequence to sequence prediction tasks get more involved, attention mechanisms have proven helpful. A prominent example is neural machine translation. Following a recent Google Colaboratory notebook, we show how to implement attention in R.Natural Language ProcessingTensorFlow/Kerashttps://blogs.rstudio.com/tensorflow/posts/2018-07-30-attention-layerMon, 30 Jul 2018 00:00:00 +0000Classifying physical activity from smartphone dataNick Strayer
https://blogs.rstudio.com/tensorflow/posts/2018-07-17-activity-detection
Using Keras to train a convolutional neural network to classify physical activity. The dataset was built from the recordings of 30 subjects performing basic activities and postural transitions while carrying a waist-mounted smartphone with embedded inertial sensors.https://blogs.rstudio.com/tensorflow/posts/2018-07-17-activity-detectionTue, 17 Jul 2018 00:00:00 +0000Predicting Sunspot Frequency with KerasMatt DanchoSigrid Keydana
https://blogs.rstudio.com/tensorflow/posts/2018-06-25-sunspots-lstm
In this post we will examine making time series predictions using the sunspots dataset that ships with base R. Sunspots are dark spots on the sun, associated with lower temperature. Our post will focus on both how to apply deep learning to time series forecasting, and how to properly apply cross validation in this domain.TensorFlow/KerasTime Serieshttps://blogs.rstudio.com/tensorflow/posts/2018-06-25-sunspots-lstmMon, 25 Jun 2018 00:00:00 +0000Simple Audio Classification with KerasDaniel Falbel
https://blogs.rstudio.com/tensorflow/posts/2018-06-06-simple-audio-classification-keras
In this tutorial we will build a deep learning model to classify words. We will use the Speech Commands dataset which consists of 65,000 one-second audio files of people saying 30 different words.TensorFlow/KerasAudio Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-06-06-simple-audio-classification-kerasWed, 06 Jun 2018 00:00:00 +0000GPU Workstations in the Cloud with PaperspaceJ.J. Allaire
https://blogs.rstudio.com/tensorflow/posts/2018-04-02-rstudio-gpu-paperspace
If you don't have local access to a modern NVIDIA GPU, your best bet is typically to run GPU intensive training jobs in the cloud. Paperspace is a cloud service that provides access to a fully preconfigured Ubuntu 16.04 desktop environment equipped with a GPU.Cloudhttps://blogs.rstudio.com/tensorflow/posts/2018-04-02-rstudio-gpu-paperspaceMon, 02 Apr 2018 00:00:00 +0000lime v0.4: The Kitten Picture EditionThomas Lin Pedersen
https://blogs.rstudio.com/tensorflow/posts/2018-03-09-lime-v04-the-kitten-picture-edition
A new major release of lime has landed on CRAN. lime is an R port of the Python library of the same name by Marco Ribeiro that allows the user to pry open black box machine learning models and explain their outcomes on a per-observation basisPackages/ReleasesTensorFlow/KerasExplainabilityImage Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-03-09-lime-v04-the-kitten-picture-editionFri, 09 Mar 2018 00:00:00 +0000Deep Learning for Cancer ImmunotherapyLeon Eyrich Jessen
https://blogs.rstudio.com/tensorflow/posts/2018-01-29-dl-for-cancer-immunotherapy
The aim of this post is to illustrate how deep learning is being applied in cancer immunotherapy (Immuno-oncology or Immunooncology) - a cancer treatment strategy, where the aim is to utilize the cancer patient's own immune system to fight the cancer.TensorFlow/KerasTabular Datahttps://blogs.rstudio.com/tensorflow/posts/2018-01-29-dl-for-cancer-immunotherapyMon, 29 Jan 2018 00:00:00 +0000Predicting Fraud with Autoencoders and KerasDaniel Falbel
https://blogs.rstudio.com/tensorflow/posts/2018-01-24-keras-fraud-autoencoder
In this post we will train an autoencoder to detect credit card fraud. We will also demonstrate how to train Keras models in the cloud using CloudML. The basis of our model will be the Kaggle Credit Card Fraud Detection dataset.TensorFlow/KerasUnsupervised LearningCloudhttps://blogs.rstudio.com/tensorflow/posts/2018-01-24-keras-fraud-autoencoderThu, 25 Jan 2018 00:00:00 +0000Analyzing rtweet Data with kerasformulaPete Mohanty
https://blogs.rstudio.com/tensorflow/posts/2018-01-24-analyzing-rtweet-data-with-kerasformula
The kerasformula package offers a high-level interface for the R interface to Keras. It’s main interface is the kms function, a regression-style interface to keras_model_sequential that uses formulas and sparse matrices. We use kerasformula to predict how popular tweets will be based on how often the tweet was retweeted and favorited.TensorFlow/KerasNatural Language Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-01-24-analyzing-rtweet-data-with-kerasformulaWed, 24 Jan 2018 00:00:00 +0000Deep Learning With Keras To Predict Customer ChurnMatt Dancho
https://blogs.rstudio.com/tensorflow/posts/2018-01-11-keras-customer-churn
Using Keras to predict customer churn based on the IBM Watson Telco Customer Churn dataset. We also demonstrate using the lime package to help explain which features drive individual model predictions. In addition, we use three new packages to assist with Machine Learning: recipes for preprocessing, rsample for sampling data and yardstick for model metrics.TensorFlow/KerasTabular DataExplainabilityhttps://blogs.rstudio.com/tensorflow/posts/2018-01-11-keras-customer-churnThu, 11 Jan 2018 00:00:00 +0000R Interface to Google CloudMLJ.J. Allaire
https://blogs.rstudio.com/tensorflow/posts/2018-01-10-r-interface-to-cloudml
We are excited to announce the availability of the cloudml package, which provides an R interface to Google Cloud Machine Learning Engine. CloudML provides a number of services including on-demand access to training on GPUs and hyperparameter tuning to optimize key attributes of model architectures.CloudPackages/Releaseshttps://blogs.rstudio.com/tensorflow/posts/2018-01-10-r-interface-to-cloudmlWed, 10 Jan 2018 00:00:00 +0000Classifying Duplicate Questions from Quora with KerasDaniel Falbel
https://blogs.rstudio.com/tensorflow/posts/2018-01-09-keras-duplicate-questions-quora
In this post we will use Keras to classify duplicated questions from Quora. Our implementation is inspired by the Siamese Recurrent Architecture, with modifications to the similarity measure and the embedding layers (the original paper uses pre-trained word vectors)TensorFlow/KerasNatural Language Processinghttps://blogs.rstudio.com/tensorflow/posts/2018-01-09-keras-duplicate-questions-quoraTue, 09 Jan 2018 00:00:00 +0000Word Embeddings with KerasDaniel Falbel
https://blogs.rstudio.com/tensorflow/posts/2017-12-22-word-embeddings-with-keras
Word embedding is a method used to map words of a vocabulary to dense vectors of real numbers where semantically similar words are mapped to nearby points. In this example we'll use Keras to generate word embeddings for the Amazon Fine Foods Reviews dataset.TensorFlow/KerasNatural Language Processinghttps://blogs.rstudio.com/tensorflow/posts/2017-12-22-word-embeddings-with-kerasFri, 22 Dec 2017 00:00:00 +0000Time Series Forecasting with Recurrent Neural NetworksFrançois CholletJ.J. Allaire
https://blogs.rstudio.com/tensorflow/posts/2017-12-20-time-series-forecasting-with-recurrent-neural-networks
In this post, we'll review three advanced techniques for improving the performance and generalization power of recurrent neural networks. We'll demonstrate all three concepts on a temperature-forecasting problem, where you have access to a time series of data points coming from sensors installed on the roof of a building.TensorFlow/KerasTime Serieshttps://blogs.rstudio.com/tensorflow/posts/2017-12-20-time-series-forecasting-with-recurrent-neural-networksWed, 20 Dec 2017 00:00:00 +0000Image Classification on Small Datasets with KerasFrançois CholletJ.J. Allaire
https://blogs.rstudio.com/tensorflow/posts/2017-12-14-image-classification-on-small-datasets
Having to train an image-classification model using very little data is a common situation, in this article we review three techniques for tackling this problem including feature extraction and fine tuning from a pretrained network.TensorFlow/KerasImage Recognition & Image Processinghttps://blogs.rstudio.com/tensorflow/posts/2017-12-14-image-classification-on-small-datasetsThu, 14 Dec 2017 00:00:00 +0000Deep Learning for Text Classification with KerasFrançois CholletJ.J. Allaire
https://blogs.rstudio.com/tensorflow/posts/2017-12-07-text-classification-with-keras
Two-class classification, or binary classification, may be the most widely applied kind of machine-learning problem. In this excerpt from the book Deep Learning with R, you'll learn to classify movie reviews as positive or negative, based on the text content of the reviews.TensorFlow/KerasNatural Language Processinghttps://blogs.rstudio.com/tensorflow/posts/2017-12-07-text-classification-with-kerasThu, 07 Dec 2017 00:00:00 +0000tfruns: Tools for TensorFlow Training RunsJ.J. Allaire
https://blogs.rstudio.com/tensorflow/posts/2017-10-04-tfruns
The tfruns package provides a suite of tools for tracking, visualizing, and managing TensorFlow training runs and experiments from R.Packages/Releaseshttps://blogs.rstudio.com/tensorflow/posts/2017-10-04-tfrunsWed, 04 Oct 2017 00:00:00 +0000Keras for RJ.J. Allaire
https://blogs.rstudio.com/tensorflow/posts/2017-09-06-keras-for-r
We are excited to announce that the keras package is now available on CRAN. The package provides an R interface to Keras, a high-level neural networks API developed with a focus on enabling fast experimentation.TensorFlow/KerasPackages/Releaseshttps://blogs.rstudio.com/tensorflow/posts/2017-09-06-keras-for-rTue, 05 Sep 2017 00:00:00 +0000TensorFlow EstimatorsYuan Tang
https://blogs.rstudio.com/tensorflow/posts/2017-08-31-tensorflow-estimators-for-r
The tfestimators package is an R interface to TensorFlow Estimators, a high-level API that provides implementations of many different model types including linear models and deep neural networks.Packages/Releaseshttps://blogs.rstudio.com/tensorflow/posts/2017-08-31-tensorflow-estimators-for-rThu, 31 Aug 2017 00:00:00 +0000TensorFlow v1.3 ReleasedJ.J. Allaire
https://blogs.rstudio.com/tensorflow/posts/2017-08-17-tensorflow-v13-released
The final release of TensorFlow v1.3 is now available. This release marks the initial availability of several canned estimators including DNNClassifier and DNNRegressor.Packages/Releaseshttps://blogs.rstudio.com/tensorflow/posts/2017-08-17-tensorflow-v13-releasedThu, 17 Aug 2017 00:00:00 +0000