VIDEO RETRIEVAL PROJECT REPORT

VIDEO RETRIEVAL PROJECT REPORT
Sohaib Abdul Rehman, Usama Mehmood and Salik Mahmood
A. Content Based Video Retrieval

As discussed earlier CBVR is an extension to CBIR. A
1. INTRODUCTION video can be sampled to obtain images (frames), so it makes
sense to make use of image retrieval methods for video
The efficient and accurate retrieval of videos retrieval.
similar to the query made by the user is much difficult than For CBVR we extracted the key frames from the video
the image retrieval. On YouTube alone more than 48 hours by uniform sampling of the video (we sampled at 1/15
of videos are posted every minute [1]. This fast expanding frames per second). We then divided each frame into 9
of the video databases requires fast and efficient methods blocks. Three histograms (for red green and blue colors)
for the retrieval of videos. The aim of this project is to were computed for each block and were concatenated to get
retrieve the videos which match the best to the query made a feature vector of a block of frame. These feature vectors
by the user. The query can be in the form of a video or a were concatenated to get a feature vector for the entire
text. frame. The feature vectors of frames were then concatenated
An important point to note about the data set is to get feature vector of a video.
that, the data set provided on LMS for Merlin and Sons of To save space and to make the method efficient, we
Anarchy seasons was not good to work for semantic quantized the histograms. In section 3.A of the report we
concepts. It firstly consisted of very long videos and look at the effect of quantization on video retrieval. Another
secondly not diverse concepts were present in the video, so technique to make the process more efficient could be to use
very vague results were obtained. global histogram for the features (instead of dividing it into
To overcome this problem, we took a set of videos blocks). Although it does not include the spatial dependence
which consisted of diverse concepts. We ran the semantic of the frames, but is time efficient. Results for these global
concept on this data base. So we could not compare it with histograms are also discussed in section 3A.
part 1&2 of the project, as it consisted of videos from There are certain limitations of this technique which is
Merlin and Sons of Anarchy. due to the fact that CBVR does not include the temporal
In the first part of the project, we looked at the dependency of the frames. As we were using uniform
content based video retrieval (CBVR). This concept is an sampling so the videos which differ by a delay of 1 second
extension to the content based image retrieval and is are not rendered similar. Moreover, as there is no
discussed in section 2.A, of the report. The second part of dependency on the concept present in the video, so this
the project was to retrieve the videos from a data set using method can declare two totally different (on the basis of
the method of tiny videos [2]. This method involves the concept) videos to be similar.
conversion of frames of a video to 32×32 feature vectors
and also incorporates the temporal dependence of the video B. Tiny Videos based Video Retrieval
(which CBVR does not). The method is discussed in detail We used uniform sampling in this method for video
in section 2.B of the report. Finally we looked at the video retrieval. In the case of uniform sampling we took frames
retrieval using semantic concepts present in the video. We after every 15 second interval from the video.
also formulated a method for automatic annotation of the
videos. This part of the project is described in section 2C of B.1 Data Base:
the report. After extracting the key frames we made the data
We have also discussed merits and demerits of all base of the feature matrix of a video. The feature matrix of a
the methods in section 2. The results from three parts of the video was computed by computing the feature vector for all
project are discussed in section 3 of the report. the key frames and stacking them into rows.
The feature vector was computed by down sampling the key
2. METHADOLOGY frame to 32×32 size. Then all the three channels were
concatenated to obtain a vector. The vector was normalized
In this part of the project we explain our approach for and made zero mean to obtain the feature vector. Making
CBVR, tiny videos and semantic concepts based video the video zero mean was necessary to cancel the effect of
retrieval. brightness difference between two otherwise similar images.
The next step was to select frames from all shots of a
B.2 Similar Video Finding: video and divide them into groups of 16×16 pixels, and run
For this part of the project we assume two videos to the DCT on the group. This gave us 256 coefficients for
be similar if they have at least one similar frame. The each group; we selected the first 100 (40% coefficients
distance between videos ‘a’ and ‘b’ were computed as used) coefficients for each group. We selected 20 frames
follows: from each shot.
We do not lose a lot of information by discarding the
DCT coefficients as more energy is present in the low DCT
coefficients. Figure 3 show the result of image
Where D2ssd(Ia,Ib) was defined as: reconstruction from 67% coefficients, 50% coefficients and
25% coefficients. The image is seen to be reconstructed
without much loss of information (although some blurring in
25% case because only low very low DCT coefficients are
used).
Where Ia ans Ib are the feature vectors of key frame ‘a’
and key frame ‘b’ respectively. Here Ia belongs to video ‘a’
and Ib belongs to video ‘b’.
We define correlation between two videos as:
We consider two videos to be similar if their

correlation is above a certain threshold. The effect of
thresholds is discussed in section 3.C.
Contrary to CBVR, this method takes into account the
temporal nature of the video. The limitations of this method
are due to the fact that there is no dependency on the
concepts present in the video, two videos with totally
different concepts can be considered to be similar by this
method. Moreover, as uniform sampling was done, so Figure 1: A Shot
videos with a delayed frames will not be rendered similar.
Also the method is dependent on the length of the video.
C. Semantic Concepts
This method was based on video retrieval on the basis
of semantic concepts present in the video. This method of
video retrieval consists of the following steps.
C.1 Manual Annotation

As the dataset provided on LMS was not annotated, so
the first task was to manually annotate the videos present in
the data set. The annotation was done by first detecting the
shots in the video and then by labeling the shots with the
concept present in the videos. This was done for all the shots
in a video. Figure 2: A Shot
C.2 Shot Detection

The shots in the video were detected on the basis of
intensity of motion. In this method the difference between
two consecutive frames was computed and the first 25
frames for which this difference was maximum were
selected. This gave us the boundries for 25 shots in the
video. Figure 1&2 shows uniformly sampled 9 frames. As
seen from the figure, the frames of a shot belong to a similar
concept, so it makes sense to give all the frames of the shots
the same label.
C.3 DCT Co-efficient:

The advantage of this method is that, in it we used
intensity of motion to evaluate the boundaries, so videos
with delayed frames will be considered similar, which was
not the case in the first 2 parts of the project. Moreover, as
we have included the concepts present in the video, so only
those videos will be declared similar which have similar
concept.
Although this method has advantages, but we need a lot
of data for the training of the algorithm, but at the same time
this large data can cause memory problems when fitting a
distribution (a reason why we used clustering instead of
GMM).
Figure 3: DCT Results
C.4 Clustering:
After obtaining the DCT coefficients (feature vectors)
for all the selected frames of the shot, we separated the
feature vectors belonging to particular concepts. For
example, for the concept ‘mountain’, we searched all the
videos for the shots labeled ‘mountain’. We obtain the
feature vectors from all the shots having the concept and ran Figure 4
the ‘kmeans clustering’, algorithm on the feaure vectors.
This clustering was done in 100 dimensional space in which 3. RESULTS
all feature vectors were points. We divided each concept In this part of the report we discuss results from the three
into three clusters. The three centroids were saved for each approaches of video retrieval.
concept.
A. Content Based Video Retrieval
C.5 Automatic Annotation The results are based on first seasons of TV shows
Next task was to automatically annotate the videos, for Merlin and Sons of Anarchy (which were used to make the
this a shot to be annotated was first converted to feature data set).
vectors (DCT coefficients for all the frames (or selected
frames)). Then the clustering was done on the resultant A.1 Quantization Levels
feature vectors, as described in the previous section. The In table 1, results are shown for different quantization
centroids obtained were matched to the centroids of all the levels. As discussed earlier, we were only using the
concepts (previous section), and the shot was labeled with histograms to compute the feature vectors, hence these
the concept for which the score was minimum. We used the vectors do not have any information regarding the concept
sum of absolute difference as the distance measure. Below of video, objects in video, etc. Hence, feature vectors of two
are the results for automatic annotation: entirely different TV shows (on the basis of concepts and
objects in videos) can give high similarity as shown in table.
Video Name: Formula 1 2014 Australian Grand Prix Official Race The table was computed by giving ‘Merlin S01E11’ as a
Edit [1080p] - Video Dailymotion.avi
Given Labels for shot 2: people query. The similarity measure used was absolute difference.
Given Labels for shot 3: race, road
Given Labels for shot 4: race, people If we assume 256 quantization level retrieval to be the
perfect case, we can see that as the quantization levels
Computed Labels for shot 2: road, people, building, race, person increase the error (w.r.t. to 256 quantization levels) in the
Computed Labels for shot 3:road, race, people, building, person retrieval of videos also increase. For example we can see
Computed Labels for shot 4:building, person, people, race, road from the table that for 128 quantization levels although the
top 10 retrieved videos remain the same but there ranks
The 9 frames from the shot 4 of the video are shown in change i.e. positions 4&5 are switched, positions 6 & 7 are
figure 4. The frames from the shot show that the annotation also no longer the same and so on. But if we are only
is not bad, as the concept of building was missed in labeling, concerned with top 10 matches (irrespective of rank), this
but automatic annotation did not miss it. can work.
For quantization levels of 64 and 8, new videos are matches with certain correlation, it is considered to be a
introduced in the top 10, which shows more error. Here similar video, but for CBVR every sampled frame should
again the top 3 videos are the same (not considering rank). match considerably with every frame of the reference video
So depending on the application we can use any of the for a considerable match.
quantization levels.
References
A.2 Global Features
Table 2 gives a comparison for global and local feature [1] Yang Cai, Linjun Yang, ‘Large Scale Near Duplicate
vectors. The table was computed for 256 quantization levels web Video Retrieval: Challenges and Approaches’
and using absolute difference as similarity measure.
As can be seen from table 2, in the top 10 retrieved [2] Alexandre Karpenko, Parham Aarabi, ‘Tiny Videos: A
videos, only 2 are different, shown with red color, although Large Data Set for Nonparametric Video Retrieval and
rank can differ. But global features with same quantization Frame Classification’
levels require much less space and are time efficient. Hence
depending upon the application we can used global or local
features.
B. Tiny Videos
The results are based on first seasons of TV shows

Merlin and Sons of Anarchy (which were used to make the
data set).
We first look at the effect of threshold of correlation on
the results. The input video was a small segment of the
‘MerlinS01E01’. Table 3 has 10 top similar videos for
different correlation values. We conclude that the results
improve as the correlation value increase.
B.1 Effect of Size of Video

As discussed earlier, this method is dependent on the size
of the video, so if we give the entire ‘MerlinS01E01’ we get
different results, shown in table 4. Here again the results
improve as the value of correlation increase.
C. Semantic Concepts
To retrieve the video, user searches for a word and the
videos in which the shot labeled with the entered word are
shown on screen. The videos are shown in the order such
that the video with maximum number of shots labeled with
the concepts is shown in the top and so on.
The results for searching different key words are shown
in table 5.
D. Comparison Between CBVR and Tiny Videos

Here we give a comparison of the results from CBVR
and tiny videos methods. We give MerlinS01E01 as input
query. The results from tiny videos based method is shown
in table 4 for different correlation values. For CBVR, we
used 256 quantization levels and local histograms, results
are in table 6.
As seen from table there is a difference in the results of 2
methods. Tiny videos method give MerlinS01E01 and
MerlinS01E02, for correlation of 0.9, but MerlinS01E02 is
not among the top 5 similar matches of CBVR. This can
happen because both techniques are based on totally
different concepts, in tiny videos even if a single shot
Sr. No 256 Quant Levels 128 Quant Levels 64 Quant Levels 8 Quant Levels
1 'Merlin S01E11’ ‘Merlin S01E11’ ‘Merlin S01E11’ ‘Merlin S01E11’
2 'Merlin S01E05’ 'Merlin S01E05’ 'Merlin S01E05’ 'Sons.of.Anarchy.S01E10’
3 'Merlin S01E01’ 'Merlin S01E01’ 'Sons.of.Anarchy.S01E10’ 'Merlin S01E05’
4 ‘Sons.of.Anarchy.S01E03’ ‘Sons.of.Anarchy.S01E10’ 'Merlin S01E01’ 'Sons.of.Anarchy.S01E06’
5 ‘Sons.of.Anarchy.S01E10’ ‘Sons.of.Anarchy.S01E03’ 'Merlin S01E13’ ‘Sons.of.Anarchy.S01E12’
6 'Sons.of.Anarchy.S01E07’ 'Merlin S01E13’ ‘Sons.of.Anarchy.S01E03’ 'Merlin S01E13’
7 'Merlin S01E10’ 'Sons.of.Anarchy.S01E07’ 'Merlin S01E04’ 'Sons.of.Anarchy.S01E02’
8 'Sons.of.Anarchy.S01E08’ 'Sons.of.Anarchy.S01E05’ ‘Sons.of.Anarchy.S01E05’ 'Sons.of.Anarchy.S01E01’
9 'Sons.of.Anarchy.S01E05’ 'Merlin S01E10’ 'Sons.of.Anarchy.S01E12’ 'Merlin S01E04’
10 ‘Merlin S01E13’ 'Sons.of.Anarchy.S01E08’ 'Merlin S01E10’ 'Sons.of.Anarchy.S01E11’
Table 1: CBVR result for MerlinS01E11 as Query, Local Histogram
Sr. Local Feature Vector Global Feature Vectors No Corr = 0.5 Corr = 0.6 Corr = 0.7 –
No 0.999
1 'Merlin S01E01’ ‘Merlin S01E01’ ‘Merlin S01E01’
1 'Merlin S01E11’ ‘Merlin S01E11’ 2 'Merlin S01E02’ 'Merlin S01E03’
2 'Merlin S01E05’ 'Sons.of.Anarchy.S01E10’
3 'Merlin S01E03’ 'Merlin S01E06’
4 ‘Sons.of.Anarchy.S01E03’ 'Sons.of.Anarchy.S01E05’ 4 'Merlin S01E04’ 'Merlin S01E07’
5 ‘Sons.of.Anarchy.S01E10’ 'Merlin S01E05’ 5 'Merlin S01E05’ 'Merlin S01E11’
6 'Sons.of.Anarchy.S01E07’ ‘Sons.of.Anarchy.S01E07’ 'Merlin S01E06’ 'Sons.of.Anarchy.S01E01’
6
8 'Sons.of.Anarchy.S01E08’ ‘Sons.of.Anarchy.S01E03’ 8 'Merlin S01E08’ 'Sons.of.Anarchy.S01E09’
9 'Sons.of.Anarchy.S01E05’ 'Sons.of.Anarchy.S01E06’ 'Merlin S01E09’ 'Sons.of.Anarchy.S01E11’
9
10 ‘Merlin S01E13’ ‘Merlin S01E01’ 10 ‘Merlin S01E10’ 'Sons.of.Anarchy.S01E12’

Table 2: CBVR result for MerlinS01E11 as Query, Global Histogram Table 3: Tiny Videos results, short segment ofMerlinS01E01 as query
No Corr = 0.5 Corr = 0.9 No. KeyWord: Key Word: Vehicle Key Word: arrow
1 Merlin S01E01 ‘Merlin S01E01’ Mountain
2 Merlin S01E02 ‘Merlin S01E02’
1 Sons of Anarchy Sons of Anarchy S01E02 MerlinS01E01
3 Merlin S01E03 S01E01
4 Merlin S01E04 2 MerlinS01E01 Sons of Anarchy S01E01 MerlinS01E02
5 Merlin S01E05
3 MerlinS01E02 MerlinS01E01 MerlinS01E03
6 Merlin S01E01
7 Merlin S01E02 4 MerlinS01E03 MerlinS01E02 Sons of Anarchy
8 Merlin S01E03 S01E01
9 Merlin S01E04 5 Sons of Anarchy MerlinS01E03 Sons of Anarchy

S01E02 S01E02
10 Merlin S01E05
Table 4: Tiny Videos results, complete MerlinS01E01 as query
Table 5: Semantic Concept Result
Result
No
1 Merlin S01E01
2 Merlin S01E05
3 Sons of Anarchy S01E03
4 Merlin S01E10
5 Sons of Anarchy S01E07
Table 6: CBVR result for MerlinS01E01 as Query, Local Histogram

VIDEO RETRIEVAL PROJECT REPORT

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

VIDEO RETRIEVAL PROJECT REPORT

Diunggah oleh

Hak Cipta:

Format Tersedia

VIDEO RETRIEVAL PROJECT REPORT

Sohaib Abdul Rehman, Usama Mehmood and Salik Mahmood

A. Content Based Video Retrieval

We consider two videos to be similar if their

C.1 Manual Annotation

C.2 Shot Detection

C.3 DCT Co-efficient:

Figure 3: DCT Results

The results are based on first seasons of TV shows

B.1 Effect of Size of Video

D. Comparison Between CBVR and Tiny Videos

10 ‘Merlin S01E13’ ‘Merlin S01E01’ 10 ‘Merlin S01E10’ 'Sons.of.Anarchy.S01E12’

9 Merlin S01E04 5 Sons of Anarchy MerlinS01E03 Sons of Anarchy

Anda mungkin juga menyukai