Skip to content

issues Search Results · repo:microsoft/unilm language:Python

Filter by

1k results
 (93 ms)

1k results

inmicrosoft/unilm (press backspace or delete to remove)

根据文章描述,在训练好BEiT后,将冻住VFFN和attention,只训练LFFN,但好像在您提供的代码里我没有发现在进行textonlymlm的时候有冻住任何层,可以请教一下是在代码哪里实现的吗
  • liu-zongxi
  • Opened 
    16 days ago
  • #1727

Hi,there, I visited the link provided in the readme document, but the access failed. The error message displayed was as follows: This XML file does not appear to have any style information associated ...
  • ruc-G
  • Opened 
    27 days ago
  • #1726

Describe the bug ReSA The problem arises when using: When I m running eval_math_local.sh, it crashed and failed with the import error A clear and concise description of what the bug is. Console Output: ...
  • MengAiDev
  • 2
  • Opened 
    29 days ago
  • #1725

Thank you for publishing the BEiT v2 code! I’m pretraining BEiT v2 on a custom industrial dataset (1.1 M fault and normal images for training; 200 k normal images for validation) and have a few questions: ...
  • IKnowWhoo
  • 1
  • Opened 
    on Jul 4
  • #1724

When will you open source the data synthesis code of paper Scaling Laws of Synthetic Data for Language Model
  • butterluo
  • Opened 
    on Jul 3
  • #1722

Hi UniLM team, The recent paper “Think Only When You Need with Large Hybrid-Reasoning Models” states that its code and models would be released in this repository. I’ve searched the repo (branches, ...
  • almogtavor
  • 1
  • Opened 
    on May 31
  • #1719

I have trained a 1.3B model using both the Differential Transformer and the standard Transformer. I observed a slight improvement in LLM evaluation scores for the Differential Transformer variant, and ...
  • fasil-saidalavi
  • 5
  • Opened 
    on May 20
  • #1718

Hi, I found a potential bug in the textdiffuser-2/inference_textdiffuser2_t2i_full.py file. current_ocr is overwritten as empty list before iteration (line 558) In line 553, current_ocr is correctly ...
  • dogcdt
  • Opened 
    on May 16
  • #1717

As I understand it, headdim is more important than the number of heads, and the diff transformer chooses to half the number of heads and double the vdim compared to normal transformers. However, wouldn ...
  • RuiWang1998
  • 3
  • Opened 
    on May 13
  • #1716

hello,I have a question about how to use wavlm to separate mix-audio.Is it possible to use the speech features extracted by WAVLM as the output of certain speech separation networks (such as ConvTasNet) ...
  • yangwyy
  • Opened 
    on May 10
  • #1715
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue search results · GitHub