出品 | OSC开源社区（ID：oschina2013)

github地址:
https://github.com/microsoft/visual-chatgpt
论文地址：
https://arxiv.org/pdf/2303.04671.pdf

除了大力投资 Open AI ，微软还亲自下场大搞 AI 。8 天前，微软开源了 Visual ChatGPT ，这个软件可以连接 ChatGPT 和一系列视觉模型，以实现在 ChatGPT 的聊天过程中发送和接收图像。

众所周知，尽管 ChatGPT 的功能非常强大，甚至可以用来写小说写论文，但目前也仅限于文字交流。

Visual ChatGPT 的出现，就像在以文字交流的 APP 中首次添加了表情包功能，而且还是根据用户输入的文本自动生成的 “定制化表情包”，大大提升了 ChatGPT 的趣味性和应用领域。

一方面，ChatGPT（或 LLM）充当通用界面，提供对图像的理解和用户的交互功能。另一方面，基础图像模型通过提供特定领域的深入知识来充当背后的技术专家。

仓库中列出了技术架构及原理图：

Demo 中共进行了三种不同类型的对话，分别是 Visual ChatGPT 接收用户的图像、Visual ChatGPT 根据用户的文本修改图像并发送给用户，以及 Visual ChatGPT 识别图片，并回答用户的提问。Visual ChatGPT 会根据用户的输入，判断是否需要使用 VFM （Visual Foundation Model，视觉基础模型）来处理该问题。

仓库中还给出了 Visual ChatGPT 所使用的图像模型和显存使用情况：

更详细的内容可以阅读 Visual ChatGPT 的 arxiv 论文：https://arxiv.org/abs/2303.04671

Visual ChatGPT 在 3 月 10 日发布，截至 3 月 16 日早 15 点，该项目已暂获 21.9K Stars ，可谓是火箭式上涨。

相关链接：https://github.com/microsoft/visual-chatgpt

使用

说明：如果计算机配置高，需要显卡，可以进行尝试，或者通过Google Colab来进行配置

环境安装：

conda create -n visgpt python=3.8 #创建环境conda activate visgpt #激活环境pip install -r requirement.txt #准备环境bash download.sh #下载模型

快速开始

# clone the repo
git clone https://github.com/microsoft/visual-chatgpt.git
# Go to directory
cd visual-chatgpt
# create a new environment
conda create -n visgpt python=3.8
# activate the new environment
conda activate visgpt
#  prepare the basic environments
pip install -r requirements.txt
# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
# prepare your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}

# Start Visual ChatGPT !
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which 
# Visual Foundation Model to use and where it will be loaded to
# The model and device are sperated by underline '_', the different models are seperated by comma ','
# The available Visual Foundation Models can be found in the following table
# For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0
# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0"

# Advice for CPU Users
python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu
# Advice for 1 Tesla T4 15GB  (Google Colab)                       
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"  
# Advice for 4 Tesla V100 32GB                            
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,ImageEditing_cuda:0,
    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,
    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,
    Image2Seg_cpu,SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,
    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,
    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"

↑ 点击即可关注 ↑

关于我的近况

目前在 SaaS 创业中，如果你想成为技术高管或技术转创业，那必不可少的要懂商业、营销、产品等等。

也可以点击下方去阅读我 SaaS 创业的原创公号分享

服务粉丝

微软开源图像交互神器 Visual ChatGPT：已获 2 万多赞！

出品 | OSC开源社区（ID：oschina2013)

使用

文章推荐

相关阅读

Azure OpenAI 官方指南03｜DALL-E 的图像生成功能与安全过滤机制

深圳“虫草奶茶”500一杯太离谱！还有阿胶奶茶、熬夜水...一年狂卖百万杯！试问：年轻人为何疯狂爱上“养生奶茶”

Adobe 推出AI绘画工具 Firefly，会取代 Midjourney 和 Stable Diffusion 么？

李彦宏称中国难出另一个OpenAI/视频号用户时长已超朋友圈/特斯拉新Model 3曝光

李彦宏称中国难出另一个OpenAI / 特斯拉新 Model 3 曝光/ 飞书发布「业务三件套」

【安全头条】ChatGPT Plus服务存在BUG：支付页面随机曝光用户注册邮箱

AI已来！

2023人工智能AIGC行业研究分析报告（附下载）

ChatGPT出现严重漏洞;Adobe加入生成式AI战局;小米汽车新专利获授权

明晚直播！TEM分析特训营：加标尺、上色美化、衬度调整、SAED衍射标定、晶面量取、EELS分析、高分辨图像信息挖

聚合标签

热门文章

百元茅台没人买，千元茅台有人追。十年前，有人花百元买了一万股茅台，后出手追中石油，如今，茅台一飞冲天，中石油深套十年。炒股，不要见异思迁，价值投资是首选。#股票#

谁说定期存款提前支取不划算？存中原银行“惠农宝”放心随便取

张小龙:交了8千万税无北京户籍孩子连私立也不让上

万字详解：ROE连续十年超过20的明星公司全面分析（附：详细名单）

“复活”半年后　京东拍拍二手杀入公益事业

离职时没有做满一个月，公司还需要帮纳当月社保吗？

最新文章

从零到一，创业必备干货社群

ESG观察丨中国特色估值体系如何纳入ESG因子

什么样的系统是具备高可用能力的呢？

死而复生的男婴 | 重症产科18

微软开源图像交互神器 Visual ChatGPT：已获 2 万多赞！

一起程序员弯道超车之路！