Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] add mix scheme #664

Merged
merged 3 commits into from
Sep 24, 2024
Merged

[wip] add mix scheme #664

merged 3 commits into from
Sep 24, 2024

Conversation

lyuwenyu
Copy link
Collaborator

@lyuwenyu lyuwenyu commented Aug 5, 2024

PaddleMIX统一多模数据格式

  1. 纯文
  2. 单图
  3. 多图
  4. interleaved
  5. 音频
  6. 视频

功能

  1. MIX格式定义和检查
  2. MM格式到MIX格式转换Op

特殊字段

  1. images <-> <image>id</image>
  2. audios <-> <audio>id</audio>
  3. videos <-> <video>id</video>
[
    {
        'id': '000002b66c9c498e',
        'images': [
                {
                    'id': 0,
                    'url': 'train/000002b66c9c498e.jpg', 
                    'heigh': 100,
                    'width': 100,
                }, 
                {
                    'id': 1,
                    'url': 'train/000002b66c9c498e.jpg', 
                    'heigh': 100,
                    'width': 100,
                }, 
            ],
        'conversations': [
                {
                    'from': 'user', 
                    'value': '<image>id</image><image>id</image> xxxx'
                }, 
                {
                    'from': 'assistant', 
                    'value': 'xxx'
                },
                {
                    'from': 'user', 
                    'value': 'xxxx <image>id</image>'
                }, 
                {
                    'from': 'assistant', 
                    'value': 'xxx'
                }
            ],
    },
]

Copy link

paddle-bot bot commented Aug 5, 2024

Thanks for your contribution!

@lyuwenyu lyuwenyu force-pushed the mix_schema branch 2 times, most recently from 592fe93 to d5650b3 Compare August 13, 2024 03:18
@lyuwenyu lyuwenyu changed the title [wip] add mix schema [wip] add mix scheme Sep 24, 2024
@lyuwenyu lyuwenyu merged commit 82a867f into PaddlePaddle:develop Sep 24, 2024
2 checks passed
ZhijunLStudio pushed a commit to ZhijunLStudio/PaddleMIX that referenced this pull request Jan 10, 2025
## PaddleMIX统一多模数据格式
1. [x] 纯文
2. [x] 单图
3. [x] 多图
4. [x] interleaved
5. [ ] 音频
6. [ ] 视频 

## 功能
1. [x]  `MIX`格式定义和检查
2. [x] `MM`格式到`MIX`格式转换Op

---

## 特殊字段
1. [x] `images <-> <image>id</image>`
2. [ ] `audios <-> <audio>id</audio>`
3. [ ] `videos <-> <video>id</video>`


```
[
    {
        'id': '000002b66c9c498e',
        'images': [
                {
                    'id': 0,
                    'url': 'train/000002b66c9c498e.jpg', 
                    'heigh': 100,
                    'width': 100,
                }, 
                {
                    'id': 1,
                    'url': 'train/000002b66c9c498e.jpg', 
                    'heigh': 100,
                    'width': 100,
                }, 
            ],
        'conversations': [
                {
                    'from': 'user', 
                    'value': '<image>id</image><image>id</image> xxxx'
                }, 
                {
                    'from': 'assistant', 
                    'value': 'xxx'
                },
                {
                    'from': 'user', 
                    'value': 'xxxx <image>id</image>'
                }, 
                {
                    'from': 'assistant', 
                    'value': 'xxx'
                }
            ],
    },
]
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants