Xbox One Kinect Can Understand Two Voices Speaking at Once

Michael Epstein

New member
Sep 9, 2013
464
0
0
Xbox One Kinect Can Understand Two Voices Speaking at Once

Players don't need to be polite, take turns talking to the Xbox One.

The Xbox One version of Kinect can detect and differentiate between multiple voices speaking at the same time, according to Microsoft VP Phil Harrison. Speaking at the Eurogamer Expo in London, Harrison explained that the Xbox One's new and improved motion control system can understand two distinct sets of voices, even when they're speaking at the same time. Microsoft new technology lead developer Nick Burton added that the Kinect will also detect whether players' mouths are moving, even in a dark room.

Microsoft has made it very clear that the Xbox One version of Kinect, which comes packaged with the upcoming console, will be a massive improvement over the first version of the peripheral. The Xbox One Kinect can detect 25 joints spread across as many as six people, calculate player heart rates and detect as many 1,400 points of articulation on their faces. The Kinect can also determine which people in the room are actively using the controller versus spectators.

Even though the Kinect is technically not a required component of the Xbox One experience, third-party developers are looking into new ways to use the peripheral's new range of features, ranging from Harmonix' use of musical-gestures in Battlefield 4 [<a href=].

Source: Polygon [http://www.polygon.com/2013/9/30/4782796/new-kinect-can-understand-two-people-talking-at-the-same-time]

Permalink
 

Ed130 The Vanguard

(Insert witty quote here)
Sep 10, 2008
3,782
0
0
But can it understand this?


(and yes Kinect will be fully functional at launch for people in Scotland)
 

Teoes

Poof, poof, sparkles!
Jun 1, 2010
5,174
0
0
Michael Epstein said:
The Xbox One Kinect can detect 25 joints spread across as many as six people..


Woah, man.

OT: The more of these things I hear about Kinect the more I remain unimpressed, but get increasingly scared instead. What does it do if those two simultaneous voices are each commanding the Xbone to do something? Please say "it explodes".
 

TiberiusEsuriens

New member
Jun 24, 2010
834
0
0
Ed130 said:
But can it understand this?

clip

(and yes Kinect will be fully functional at launch for people in Scotland)
I DID! I think my ears are done for the day, but it is in fact possible. Now the question is, can it understand that with bagpipes in the background (because all Scottish people have them in their houses, duh)?
 

Teoes

Poof, poof, sparkles!
Jun 1, 2010
5,174
0
0
TiberiusEsuriens said:
I DID! I think my ears are done for the day, but it is in fact possible. Now the question is, can it understand that with bagpipes in the background (because all Scottish people have them in their houses, duh)?
I work on a busy central street in Glasgow and the frequency with which we have to tolerate bagpipers in earshot is maddening. People phone us, and yes, bloody pipes can heard in the background. They're not helping the stereotype!
 

Spaceman Spiff

New member
Sep 23, 2013
604
0
0
Wow, the amount of information this thing can monitor is incredible. I don't want one anywhere near me.
 

Erttheking

Member
Legacy
Oct 5, 2011
10,845
1
3
Country
United States
Yeah that's cool, but if two people give it conflicting orders what will it do?
 

lacktheknack

Je suis joined jewels.
Jan 19, 2009
19,316
0
0
Whelp, if someone says "Xbox off", there's not a dang thing you can do about it or drown them out with. :p
 

Mahorfeus

New member
Feb 21, 2011
996
0
0
While I do not appreciate the price bloat that comes from the new Kinect being shoved down our throats, I do have to admit that I am anticipating its various improvements over the original. It is certainly something I can see myself using more if it ends up being as functional as they say. That being said, the new technology doesn't mean squat if developers won't take advantage of it, which I think is a bit of a shame.
 

Diablo1099_v1legacy

Doom needs Yoghurt, Badly
Dec 12, 2009
9,732
0
0
Okay, how the hell did the OP get banned? Was he a MS rep or something? o_O Nevermind, Site Hiccup on my end ^^'

Still, $100 dollars for THAT!? Sheesh, Priorities much?
 

DataSnake

New member
Aug 5, 2009
467
0
0
So what happens if two people give it mutually exclusive orders at the same time? Like, one says "watch TV" and the other says "play Halo", or something like that?
 

Petromir

New member
Apr 10, 2010
593
0
0
DataSnake said:
So what happens if two people give it mutually exclusive orders at the same time? Like, one says "watch TV" and the other says "play Halo", or something like that?
Probably one of the number of thigs that current hardware does when given conflicting instructions (often varying by program running on the same OS) it will likely perform either the first or last instruction until it completes, or possibly the instructions sequentially. Its a computer it will somehow register one before the other to decide this, the chances of two instructions finishing to close together for it to decide one was first are fairly small. I'm skeptical about how well this will work but that part is relatively simple, hell I'd not be suprised if they didnt really need to specifically code for it an just let it happen as its fairly likely one of the possibilities above will happen anyway.

Spaceman Spiff said:
Wow, the amount of information this thing can monitor is incredible. I don't want one anywhere near me.
For the purposes of monertering you it makes far more sense to hide a microphone in it and process it at their HQ than use a dirty great obvious thing like kinect 2.0, that to boot can be unplugged.....

A mobile phone with inbuilt GPS and the like is a far more sensible starting point to monitor people with than a console.
 

DiamanteGeeza

New member
Jun 25, 2010
240
0
0
Michael Epstein" post="7.829751.20219596 said:
The Xbox One version of Kinect can detect and differentiate between multiple voices speaking at the same time

No it won't. I'm telling you categorically right now that if two people are speaking at exactly the same time and are standing right next to each other, it won't have a clue what's being said. Understanding what a human voice is saying is really, really hard, and I know because I've dabbled in writing systems that do just that during my (long) career. Understanding a single voice is difficult enough (which is why there will only be a few countries supported by the damn thing), but as soon as you add any layer of mush over the top (be it a drill, a car, a TV in the background, or another human voice), the audio signal you get is a garbled mess. Sure, you can filter out what isn't likely to be a human voice - extremely high and low frequencies, for example - but two human voices at the same time occupy a similar range of pitch, so separating the two by using a generic catch-all algorithm in real-time is, basically, impossible.

However, bearing in mind that the Kinect has a 4 mic array, I'm guessing that it will rely on the two people speaking to be a good distance apart, and can then bias a particular person's voice using a particular input based on their location. So if I'm standing to the extreme left, and someone else is standing at the extreme right, then our voices would attempt to be decoded by isolating the mic input nearest to us. This is the only way I can think of that has any faint hope of working. And even then, it'll get it wrong for much of the time, I suspect.

If the two speakers are standing either right next to each other, or in a line, it won't have any idea what either person is saying.
 

Strazdas

Robots will replace your job
May 28, 2011
8,407
0
0
lacktheknack said:
Whelp, if someone says "Xbox off", there's not a dang thing you can do about it or drown them out with. :p
kinect should provide a disposable knife or other sharp object for situations like these so you can make sure it wont happen again.

DiamanteGeeza said:
However, bearing in mind that the Kinect has a 4 mic array, I'm guessing that it will rely on the two people speaking to be a good distance apart, and can then bias a particular person's voice using a particular input based on their location. So if I'm standing to the extreme left, and someone else is standing at the extreme right, then our voices would attempt to be decoded by isolating the mic input nearest to us. This is the only way I can think of that has any faint hope of working. And even then, it'll get it wrong for much of the time, I suspect.

If the two speakers are standing either right next to each other, or in a line, it won't have any idea what either person is saying.
thats really the ONLY way you can ever do that anyway. if you use same mic for both voices the voicewaves overlap anyway and the software would ahve to be specifically told both people speak at same time and have its voice examples to comapre to even begin to try to udnerstand anything.
Though the more likely situation is actually that its more PR talk and it wont work. you know, pretty much like everything with old Kinect.
 

DiamanteGeeza

New member
Jun 25, 2010
240
0
0
Strazdas said:
thats really the ONLY way you can ever do that anyway. if you use same mic for both voices the voicewaves overlap anyway and the software would ahve to be specifically told both people speak at same time and have its voice examples to comapre to even begin to try to udnerstand anything.
Nope, you can't do it. It doesn't matter if you have prior, clear samples of the voices of the people talking, unless one person has a really high pitched voice, and the other has a very, very deep voice, it's simply not possible to separate the two if its recorded using a single mic.
 

Strazdas

Robots will replace your job
May 28, 2011
8,407
0
0
DiamanteGeeza said:
Strazdas said:
thats really the ONLY way you can ever do that anyway. if you use same mic for both voices the voicewaves overlap anyway and the software would ahve to be specifically told both people speak at same time and have its voice examples to comapre to even begin to try to udnerstand anything.
Nope, you can't do it. It doesn't matter if you have prior, clear samples of the voices of the people talking, unless one person has a really high pitched voice, and the other has a very, very deep voice, it's simply not possible to separate the two if its recorded using a single mic.
I meant more towards the it would know the exact pitch level and would know what to look for. so yes it is the high versus deep. though that wouldnt be ap roblem for me personally as i got a really deep one, so deep in fact that quite a few internet voice chat programs filter me out as background noise (its always fun when that happens), but yep two voices in same mic is quite impossible in average household.