Transformers.js 2.13、2.14 发布，新增8个新的架构-电子发烧友网

Transformers.js 作者 Joshua Lochner 在 GitHub 宣传 Transformers.js v2.13 和 v2.14 发布。具体更新如下（文中提到的链接，可通过阅读原文获取）：
8 个新的架构!这个版本支持了很多新的多模态架构，能够支持的架构总数达到了 80 个!1.支持超过 1000 种语言的多语种文本转语音的 VITS！(#466)

import { pipeline } from '@xenova/transformers';


// Create English text-to-speech pipeline
const synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-eng');


// Generate speech
const output = await synthesizer('I love transformers');
// {
//   audio: Float32Array(26112) [...],
//   sampling_rate: 16000
// }

请参阅此处了解可用模型的列表。首先，我们在 Hugging Face Hub 上转换了约 1140 个模型中的 12 个。如果其中没有你想要的，可以使用我们的转换脚本自行转换。

2. CLIPSeg 用于零样本图像分割。(#478)

import { AutoTokenizer, AutoProcessor, CLIPSegForImageSegmentation, RawImage } from '@xenova/transformers';


// Load tokenizer, processor, and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clipseg-rd64-refined');
const processor = await AutoProcessor.from_pretrained('Xenova/clipseg-rd64-refined');
const model = await CLIPSegForImageSegmentation.from_pretrained('Xenova/clipseg-rd64-refined');


// Run tokenization
const texts = ['a glass', 'something to fill', 'wood', 'a jar'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });


// Read image and run processor
const image = await RawImage.read('https://github.com/timojl/clipseg/blob/master/example_image.jpg?raw=true');
const image_inputs = await processor(image);


// Run model with both text and pixel inputs
const { logits } = await model({ ...text_inputs, ...image_inputs });
// logits: Tensor {
//   dims: [4, 352, 352],
//   type: 'float32',
//   data: Float32Array(495616)[ ... ],
//   size: 495616
// }

您可以按如下方式可视化预测结果：

const preds = logits
  .unsqueeze_(1)
  .sigmoid_()
  .mul_(255)
  .round_()
  .to('uint8');


for (let i = 0; i < preds.dims[0]; ++i) {
  const img = RawImage.fromTensor(preds[i]);
  img.save(`prediction_${i}.png`);
}

Original	`"a glass"`	`"something to fill"`	`"wood"`	`"a jar"`

请查看此处以获取可用模型列表。

3. SegFormer 用于语义分割和图像分类。（＃480）

import { pipeline } from '@xenova/transformers';


// Create an image segmentation pipeline
const segmenter = await pipeline('image-segmentation', 'Xenova/segformer_b2_clothes');


// Segment an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/young-man-standing-and-leaning-on-car.jpg';
const output = await segmenter(url);

4. Table Transformer 用于从非结构化文档中提取表格。（＃477）

import { pipeline } from '@xenova/transformers';


// Create an object detection pipeline
const detector = await pipeline('object-detection', 'Xenova/table-transformer-detection', { quantized: false });


// Detect tables in an image
const img = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/invoice-with-table.png';
const output = await detector(img);
// [{ score: 0.9967531561851501, label: 'table', box: { xmin: 52, ymin: 322, xmax: 546, ymax: 525 } }]

5. DiT用于文档图像分类。（＃474）

import { pipeline } from '@xenova/transformers';


// Create an image classification pipeline
const classifier = await pipeline('image-classification', 'Xenova/dit-base-finetuned-rvlcdip');


// Classify an image 
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/coca_cola_advertisement.png';
const output = await classifier(url);
// [{ label: 'advertisement', score: 0.9035086035728455 }]

6. SigLIP用于零样本图像分类。（＃473）

import { pipeline } from '@xenova/transformers';


// Create a zero-shot image classification pipeline
const classifier = await pipeline('zero-shot-image-classification', 'Xenova/siglip-base-patch16-224');


// Classify images according to provided labels
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url, ['2 cats', '2 dogs'], {
    hypothesis_template: 'a photo of {}',
});
// [
//   { score: 0.16770583391189575, label: '2 cats' },
//   { score: 0.000022096000975579955, label: '2 dogs' }
// ]

7. RoFormer 用于蒙版语言建模、序列分类、标记分类和问题回答。(#464)

import { pipeline } from '@xenova/transformers';


// Create a masked language modelling pipeline
const pipe = await pipeline('fill-mask', 'Xenova/antiberta2');


// Predict missing token
const output = await pipe('Ḣ Q V Q ... C A [MASK] D ... T V S S');

8.分段任意模型 (SAM)

分段任意模型（SAM）可以在给定输入图像和输入点的情况下，用于生成场景中对象的分割蒙版。请查看此处以获取完整的预转换模型列表。对该模型的支持已在#510中添加。

例子+源码:https://huggingface.co/spaces/Xenova/segment-anything-web

示例：使用 Xenova/slimsam-77-uniform 执行掩模生成。

import { SamModel, AutoProcessor, RawImage } from '@xenova/transformers';


const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');


const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]] // 2D localization of a window


const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);


const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);
// [
//   Tensor {
//     dims: [ 1, 3, 410, 614 ],
//     type: 'bool',
//     data: Uint8Array(755220) [ ... ],
//     size: 755220
//   }
// ]
const scores = outputs.iou_scores;
console.log(scores);
// Tensor {
//   dims: [ 1, 1, 3 ],
//   type: 'float32',
//   data: Float32Array(3) [
//     0.8350210189819336,
//     0.9786665439605713,
//     0.8379436731338501
//   ],
//   size: 3
// }

这样可以将这三个预测蒙板可视化：

const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');

Input image	Visualized output

接下来，选择 IoU 分数最高的通道，在本例中是第二个（绿色）通道。将其与原始图像相交，我们得到了该主题的孤立版本：

Selected Mask	Intersected

其他改进

修复了@Lian1230在#461中提交的关于Next.js Dockerfile的HOSTNAME 问题。
在#467中，在 README 中添加了空模板的链接。
在 #503 中添加对使用 ConvNextFeatureExtractor 处理非方形图像的支持
通过 #507 对远程 URL 中的修订进行编码
@Lian1230 在 #461 中进行了他们的首次贡献。

改进#485中的pipeline函数的类型。感谢@wesbos提出的建议！

意味着当您将鼠标悬停在类名称上时，您将获得示例代码来帮助您。

此版本是 #485 的后续版本，具有额外的以智能感知为中心的改进（请参阅 PR）。

添加对跨编码器模型的支持（+修复令牌类型 ID）（#501）

示例：使用 Xenova/ms-marco-TinyBERT-L-2-v2 进行信息检索。

import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';


const model = await AutoModelForSequenceClassification.from_pretrained('Xenova/ms-marco-TinyBERT-L-2-v2');
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/ms-marco-TinyBERT-L-2-v2');


const features = tokenizer(
    ['How many people live in Berlin?', 'How many people live in Berlin?'],
    {
        text_pair: [
            'Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.',
            'New York City is famous for the Metropolitan Museum of Art.',
        ],
        padding: true,
        truncation: true,
    }
)


const { logits } = await model(features)
console.log(logits.data);
// quantized:   [ 7.210887908935547, -11.559350967407227 ]
// unquantized: [ 7.235750675201416, -11.562294006347656 ]

声明：本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人，不代表电子发烧友网立场。文章及其配图仅供工程师学习之用，如有内容侵权或者其他违规问题，请联系本站处理。举报投诉

源码

源码

+关注

关注
8

文章
645

浏览量
29273
模型

模型

+关注

关注
1

文章
3259

浏览量
48907
架构

架构

+关注

关注
1

文章
516

浏览量
25494

原文标题：Transformers.js 2.13、2.14 发布，新增 8 个新的架构

文章出处：【微信号：vision263com，微信公众号：新机器视觉】欢迎添加关注！文章转载请注明出处。

鸿蒙跨端实践-JS虚拟机架构实现

类似的框架，我们需要自行实现以确保核心基础能力的完整。鸿蒙虚拟机的开发经历了从最初 ArkTs2V8 到 JSVM + Roma新架构方案。在此过程中，我们实现了完整的鸿蒙版的“J2V8”和基于系统JSVM的

发表于 09-30 14:42 •2442次阅读

鸿蒙跨端实践-<b class='flag-5'>JS</b>虚拟机<b class='flag-5'>架构</b>实现

使用基于Transformers的API在CPU上实现LLM高效推理

英特尔 Extension for Transformers是英特尔推出的一个创新工具包，可基于英特尔架构平台，尤其是第四代英特尔至强可扩展处理器（代号 SapphireRapids，SPR）显著加速基于

发表于 01-22 11:11 •2669次阅读

使用基于<b class='flag-5'>Transformers</b>的API在CPU上实现LLM高效推理

用户管理-动态调用VI（新增用户插件）

介绍一种基于动态调用VI的用户登录管理的方法，结合之前介绍的源代码发布，将新增的用户信息（一个独立的VI）以源代码发布的形式（去除程序面板）放入指定User List文件夹下，即使生成

发表于 04-26 22:40

OpenHarmony 3.0 LTS 新增特性功能

内容：标准系统新增特性功能用户程序框架支持服务能力(ServiceAbility，DataAbility)和线程模型。支持文件安全访问，即文件转成URI和解析URI打开文件的能力。支持设备管理PIN码

发表于 09-30 08:24

94个JS/eTS开源组件首发上新，肯定有你要用的一款！

2021年的华为开发者大会（HDC2021）上，我们发布了新一代的声明式UI框架——方舟开发框架（ArkUI）。 ArkUI框架引入了基于TS扩展的声明式开发范式。自此，越来越多的开发者加入到JS

发表于 05-09 14:51

HarmonyOS 3.0 Beta版本说明

与OpenHarmony SDK配套使用。配套JS/eTS SDK、Native SDK，推荐使用JS/eTS进行应用开发。OpenHarmony SDK新增API Version 8

发表于 07-07 14:16

面向开发者的HarmonyOS 3.0 Beta发布

与OpenHarmony SDK配套使用。● 配套JS/eTS SDK、Native SDK，推荐使用JS/eTS进行应用开发。● OpenHarmony SDK新增API Version 8

发表于 07-08 11:14

OpenHarmony 3.2 Beta2 版本发布：支持电源管理重启恢复机制等

worker传递I58034 【增强特性】使用libuv统一JS Looper机制I57ZZH 【新增特性】提供创建不同Hap包上下文能力NA包管理新增默认应用管理能力，支持众测应用、获取包指纹信息等

发表于 08-02 10:31

DevEco Studio 3.1 Beta1版本发布——新增六大关键特性，开发更高效

、开发、编译、调试等功能。2023年2月16日发布的DevEco Studio 3.1 Beta1版本，在Canary1版本基础上，新增以下关键特性：-> 新增支持Windows 11

发表于 02-24 11:22

BJDEEN PULSE TRANSFORMERS

aboutthe need for versatile pulse transformers that meet all the electricalrequirements of Manchester II serial biphas

发表于 06-11 08:40 •9次下载

node.js的js要点总结

Node.js是一个面向服务器的框架，立足于Chrome强大的V8 JS引擎。尽管它由C++编写而成，但是它及其应用是运行在JS上的。本文为

发表于 10-13 10:39 •0次下载

GPU-Z 2.26.0正式发布新增对部分假冒显卡核心的支持

TechPowerUp刚刚发布了最新版的GPU-Z 2.26.0，除了支持部分新硬件，还修复了大量Bug，并新增了对部分假冒显卡核心的支持，再也不怕被JS坑了。

发表于 10-09 15:26 •740次阅读

安徽省已累计建设完成5G基站2.14万个

当前，在安徽，5G示范应用初见成效，5G发展开局良好。从基站建设情况来看，安徽省已累计建设完成5G基站2.14万个，预计全年将顺利完成2.5万个5G基站铺设，基本实现地级市城区连续覆盖。

发表于 11-04 16:36 •1899次阅读

贸泽电子新品推荐：2021年8月新增超20000个物料

　2021年8月，贸泽总共新增了20，276个物料，均可在订单确认后当天发货。

发表于 10-08 14:27 •3749次阅读

Transformers的功能概述

近年来，我们听说了很多关于Transformers的事情，并且在过去的几年里，它们已经在NLP领域取得了巨大成功。Transformers是一种使用注意力机制(Attention)显著改进深度学习

发表于 01-23 10:15 •713次阅读

搜索历史

Transformers.js 2.13、2.14 发布，新增8个新的架构

3. SegFormer 用于语义分割和图像分类。（＃480）

5. DiT用于文档图像分类。（＃474）

6. SigLIP用于零样本图像分类。（＃473）

7. RoFormer 用于蒙版语言建模、序列分类、标记分类和问题回答。(#464)

8.分段任意模型 (SAM)

评论

鸿蒙跨端实践-JS虚拟机架构实现

使用基于Transformers的API在CPU上实现LLM高效推理

用户管理-动态调用VI（新增用户插件）

OpenHarmony 3.0 LTS 新增特性功能

94个JS/eTS开源组件首发上新，肯定有你要用的一款！

HarmonyOS 3.0 Beta版本说明

面向开发者的HarmonyOS 3.0 Beta发布

OpenHarmony 3.2 Beta2 版本发布：支持电源管理重启恢复机制等

DevEco Studio 3.1 Beta1版本发布——新增六大关键特性，开发更高效

BJDEEN PULSE TRANSFORMERS

node.js的js要点总结

GPU-Z 2.26.0正式发布新增对部分假冒显卡核心的支持

安徽省已累计建设完成5G基站2.14万个

贸泽电子新品推荐：2021年8月新增超20000个物料

Transformers的功能概述