You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's a minimal code snippet for applying a pre-trained Spatial Transformer (non-clustering) to one image:
frommodelsimportget_stnfromutils.downloadimportdownload_modelfromutils.vis_tools.helpersimportload_pilfromtorchvision.utilsimportsave_imageresolution=512# resolution the input image will be resized to (this can be any power of 2)input_img=load_pil('my_image.png', resolution) # load the input image and resize to (1, C, resolution, resolution)ckpt=download_model('cat') # download model weightsstn=get_stn(['similarity', 'flow'], flow_size=128, supersize=resolution).to('cuda') # instantiate STNstn.load_state_dict(ckpt['t_ema']) # load weightsaligned_img=stn(input_img, iters=3, output_resolution=resolution) # forward pass through the STNsave_image(aligned_img, 'output.png', normalize=True, range=(-1, 1)) # save to disk
If you're using the celeba or cub models, use iters=1 instead. If your input image isn't square you may want to pad or crop it beforehand. Also, stn supports batch mode, so input_img can be an (N, C, H, W) tensor containing multiple images, in which case aligned_image will also be (N, C, H, W).
how to produce a single pic, not a video
The text was updated successfully, but these errors were encountered: