Anda di halaman 1dari 5

[MUSIC].

Alright, so more generally, you can have


what we'll call a theta-join.
And this is essentially just a join, but
the condition here can be anything you
want.
Okay.
Rather than just an equality condition.
This could be greater than or less than,
or arbitrary functions, and so on.
Okay, and so this all pair similarity
test that I talked about before is an
example of a theta-join.
And yeah, we'll see a more detailed
example in a second.
Right, and so, just to point out that
that equi-join itself is a special case
of theta-join, where, where theta is just
the equality condition.
Alright, so here's some examples of
theta-joins, just to sort of demonstrate
that these come up pretty often in
practice, more than you might be familiar
with.
And again, especially speaking to the
people who are familiar, who have
experience with databases you know, these
are not going to be along for key
relationships quite as often.
Right, okay.
So if you want to, say, find all
hospitals within five miles of a school,
well, you know, this doesn't immediately
seem like a relation algebra query or, or
a SQL query.
But it kind of is, right, it's just a
join where the join condition is this
distance function over the location of a
hospital and the location of a school.
Okay, and then I did a projection here to
sort of project out the name of the
hospital, because the, the English
version of this seemed to suggest that we
just want the name of the hospital and
that's it.
Okay, and so in SQL, this, this might
look like this, where you say look, give
me all combinations of the hospitals and
schools, and then filter on the ones
where the location of the hospital is is
less than five miles away from the
location of the school.
And here, I'm kind of assuming that there
exists some distance function that knows
how to compute this.
And we'll talk at the end about how new
functions that are not part of the
language, are not part of relation
algebra, can be registered in the system,
and that's this notion of user defined

functions.
But trust me for right now that these
things can exist.
Okay.
And in fact, they don't even have to be
user defined.
There's many functions that are already
available in databases for manipulating
say, for example, strings.
And in fact, even for geographic
information, there actually are distance
functions available in those commercial
databases.
Okay.
So you'll see this structure, the
takeaway here is that I want you to still
think join, right?
Just because you don't see a quality
condition doesn't mean there's not a join
going on, it's just the same, it's the
same kind of join as everything else.
And then, the other thing, the other
takeaway is just to know the term
theta-join in case that comes up, okay.
Usually whenever, when anybody, when
anybody's talking about theta-join, what
they mean is, you know, difficult joins,
right?
Arbitrary joins, gen, the general case of
joins.
Alright.
So, another example that's maybe you
might sort of be able to think about
coming up in practice in your own work
is, well, you know, find all the user
clicks made within five seconds of some
page load.
And this is sort of much like the
distance argument behore, before.
But now we can think about it in terms of
time, which is just a one-dimensional the
one-dimensional distometric is easier to
define.
So we say, find the click time minus the
load time of the page.
Right, so c.click and p.load, and take
that absolute value and see where that's
less than five.
And so, this might be when you're trying
to find people who find what they're
looking for quickly.
Right, this is a, this is a metric that,
that web analytics people might use
frequently.
Right, if people sort of stare at a page
for a long time.
Maybe if it's an article, that might be
good, that means they're reading the
article, if it's a navigation page, it
may be bad.

That means they don't, they don't find


what they're looking for quickly.
Okay, you might also hear about band
joins or range joins, and this is things
like find the there might be an interval
of time.
Start start time and end time in one
table.
And you're trying to find tuples from
another table that fall within that
interval.
Okay.
And we'll actually see an example of
that.
Okay.
There's other joins, the, another join
to, to recognize that exists is this
notion of an outer join.
And here, what you're saying is, you want
all the tuples from the left.
R1, R2, we'll write it like this, with
this sort of missing leg here.
You'll want all the tuples from the left
side, if you've written it this way.
And if the tuple on the right hand side
matches, great.
You put it out, and it's just like a
regular join.
But if it, if there is no match, you
still include the R1 tuple, and you pad
out the other, the other columns with
null as needed, okay.
So any value you don't have, make it a
null.
Alright, and so the variants here that
aren't particularly important is left
outer join, right outer join, so, whoops,
geez.
Left outer join, right outer join, and
sort of full outer join, which is a
little bit hard to write.
because it looks like a cross product.
The right outer join you really sort of
never need, because you can always just
reorder the, the operations.
Full outer joins means that you want
everything from both tuples padded out
with null.
And, these are sort of ugly to reason
about formally, but they come up pretty
often in practice, because especially for
users, when they're well, I say
especially for users as opposed to
applications.
You know, if you were writing a query by
hand, basically, many times people find
it surprising that, that records in their
table disappear because they joined it
with another table.
Alright, but that can happen because you

only, you, you said that you only want


pairs of tuples where some condition
matches.
And so you might, you might have no
matches, and things disappear.
And so outer join comes up as a useful
way to match more what the, what the SQL
programmers expect, especially with
novices.
Okay.
Alright.
And so, an example of this.
We have two tables here.
Anonymous patient, and an anonymous job,
we could do an outer join.
Now what, what, what, what columns did we
join on here?
Well it doesn't specify, we sort of
omitted it here.
Although technically, we should write
that, you know, right there in the
subscript of the join operator.
But we didn't, but you, of course, you
can probably figure it out.
Well, so you look at the columns that
they have in common.
Right, this has an age column and this
has a zip column.
And this has an age column and this has a
zip column.
And so it's actually on both of those
columns, alright?
For every tuple in P, find me a
corresponding tuple in J, in Anonymous
Job J, whoops, there's extra in there.
where age equals 54 and zip equals 98125.
Okay.
So, if we had just done a join, then this
tuple would be removed from the output,
because it has no corresponding tuple on
this side.
There is no 33 98120.
But because we did an outer join, we do
include the tuple, and we padded it with
a null here.
Okay.
And we write this in SQL wish I'd
included this.
We write this in SQL, let me back up a
couple of steps here and I'll show you.
So just like we have join here, you can
actually write outer join explicitly.
Outer, if you wanted you to.
I'm not going to leave that in the slide
because it'll be confusing, since it's
out of context but
And, in fact, you can say left outer
join, I guess underneath, to match our
example.
Okay.

Put in the slides, the database that


you'll be using in the assignment is
SQLite, which has some nice properties
for a single user case.
The, the entire database is stored as a
single file and you can pass it around
and so on.
So it's a good tool to sort of have in
your toolbox, it's one of the reasons why
I selected for the assignments.
But it actually has some limitations, and
one of which is, it can't express certain
kind of outer joins.
Okay, in particular, full outer join.

Anda mungkin juga menyukai