Changchang Wu
Content
Basic Usage
Using the VisualSFM GUI
Using VisualSFM through command-line
Dependency on SiftGPU/PBA and PMVS/CMVS
Typical hardware requirements
Something to watch out for high resolution images
Advanced Usage
Specify your pair-list for matching
Use your own feature matches
Use your own feature detectors
Adjust parameters for higher speed
Technical Details
The camera model and coordinate system
Focal length initialization
The intermediate: features and matches
The output format: N-View Match (NVM)
Work with the GUI
Visualization
Mouse Controls and Navigation
Keyboard shortcuts
Some Cool Features
Command Lines within VisualSFM!
Show animations easily with VisualSFM
Manual Intervention of reconstruction
Limitations and Issues
Potential problem with small GPU memory
Limitations of VisualSFM
Multi-threading stability
Frequently Asked Questions (FAQs)
1. The meaning of some typical error messages (e.g. ERROR1).
2. Where can I download the depending tools?
3. Dense reconstruction tools other than PMVS?
4. What if some of the GPU computations do not work?
5. What if VisualSFM fails to find two images for initialization?
6. What if VisualSFM produces multiple models?
7. Is VisualSFM open-sourced? Programmable control of VisualSFM?
4. Dense reconstruction by using Yasutaka Furukawa's CMVS/PMVS (obtain CMVS package yourself!)
"Sfm->Reconstruct Dense". Note that CMVS/PMVS related parameters are stored in nv.ini
You will be prompt to save [name].nvm file, and CMVS will run in the folder [name].nvm.cmvs
If you save the reconstruction to [name].nvm, [name].i.ply is the result of the i-th model.
* Windows users need to use "cmd /c VisualSFM ..." when using VisualSFM in a batch file.
During the incremental reconstruction step, if you press "Ctrl+C" once, it will quit the loop
and save the current model. If you press "Ctrl+C" THREE times, exit(0) will be called.
Using command lines should be slightly faster on large datasets, because it does not generate
thumbnails (texture) for visualization. In the GUI mode, you can also skip the pixel loading
part by clicking pause "||" button when it says "loading pixel data...".
Dependency on SiftGPU/PBA and PMVS/CMVS
The VisualSFM GUI can run without those libraries. However you need SiftGPU for feature
detection and matching, PBA for sparse reconstruction, PMVS/CMVS for dense reconstruction.
SiftGPU & PBA are the GPU-accelerated feature detection and bundle adjustment for the SfM
system. (Click here if SiftGPU or PBA doesn't work for you)
PMVS/CMVS is the dense reconstruction module used by VisualSFM. Yasutaka Furukawa's original
CMVS is here. Windows binaries can be found in the SfM packages distributed by Pierre Moulon.
*. GPU computation does not work in Windows remote desktop, use VNC for remote work.
*. If you do not have good graphic cards, click here for the solution.
Something to watch out for high resolution images (when using SiftGPU)
There is an important (modifiable) parameter for SiftGPU feature detection.
maximum_working_dimension = 3200 by default.
Given an image of size d = max(w, h), Lowe's original SIFT detection starts with the up-sampled
image of sz = 2 * d. SiftGPU actually starts the detection with size
sz' = max{sz/2^i | sz/2^i <= maximum_working_dimension, integer i >= 0};
For example, if your image size is 1024, 1600, 2048, 3200, 4000, the corresponding sizes after
the adjustment are 2048, 3200, 2048, 3200, 2000 respectively.
This is usually not a problem. Sometimes you want to get more features for better result, so
you may not want size 4000 reduced to 2000. In such cases, you can do the following:
1. If you are sure you have enough memory for size 4000,
you should change the parameter by "Tools -> Enable GPU -> Set Maximum DIM"
For example, you can compute 3-pair matching with the following file
a.jpg b.jpg
a.jpg c.jpg
b.jpg c.jpg
For simple video sequences, where you want to match every two frames in a certain range, use
"SfM->Pairwise matching->Compute Sequence Match". For large datasets, you may use recognition
methods (Vocabulary tree or GIST clustering) to find the image pairs for matching.
For example, the following gives the 24 matches between 888.jpg and 709.jpg
888.jpg 709.jpg 24
19 18 24 3651 1511 2899 71 115 201 202 199 1639 2595 210 189 1355 268 241 137 728 1899 193 192 325
139 143 181 261 342 349 373 433 622 623 686 700 745 812 868 951 987 990 1001 1016 1021 1046 1047 1069
where feature #19 of 888.jpg matches feature #139 of 709.jpg
Make sure the 0-based feature indices are within the correct ranges!
Option 2. Write the .sift files in Lowe's ASCII format, they will be automatically converted to
the VisualSFM binary format. (Let me know if there is a bug).
[Location Data] is a npoint x 5 float matrix and each row is [x, y, color, scale, orientation].
Write color by casting the float to unsigned char[4]
scale & orientation are only used for visualization, so you can simply write 0 for them
* Sort features in the order of decreasing importance, since VisualSFM may use only part of those features.
* VisualSFM sorts the features in the order of decreasing scales.
[Descriptor Data] is a npoint x 128 unsigned char matrix. Note the feature descriptors are normalized to 512.
Lots of time might be spent on saving results (especially for NFS), a beta feature is now
enabled by default under Linux to improve the efficiency. It can be disabled by setting
param_asynchronous_write= 0 or unchecking "Pairwise Matching->Asynchronous Writes"
NOTE that the parameters saved in NVM file is slightly different with the internal
representation. Instead, NVM saves the following for each camera:
f, R (as quaternion), C = - R'T, rn = r * f * f.
The PBA code includes functions for loading the NVM file and convert the camera parameters.
As for the image coordinate system, X-axis points right, and Y-axis points downward, so Z-axis
points forward.
The principal points are assumed to be at image centers except when using a single fixed
calibration. When using fixed calibration [fx, cx, fy, cy], the error is estimated in a
transformed image coordinate system.
K = [fx 0 0; 0 fx 0; 0 0 1], Kc = [1 0 cx; 0 fy/fx cy; 1],
You can see that Kc * K = [fx 0 cx; 0 fy cy; 0 0 1];
Let (u, v, 1)' be an original feature location (relative to top-left corner of images)
The measurement is defined as (mx, my, 1)' = Inv(Kc) * (u, v, 1)'
Note: the feature locations saved in NVM files are still relative to the image centers rather
than the calibrated principal point [cx, cy];
TIP: Keep the original EXIF if you resize JPEGs! (e.g. XNView can do that)
In case the EXIF does not contain those information, the focal lengths are set to be 1.2 *
max(width, height), which corresponds to a medium viewing angle.
VisualSFM also support using a single fixed calibration [fx, cx, fy, cy] for all images.
Use menu "SfM -> More functions -> Set Fixed Calibration"
TIP: When adding new photos (or close/restart), VisualSFM will match only what is missing.
If you want to change feature-detection parameters, and re-run the reconstruction. You need to
delete all the corresponding .sift and .mat files.
If you want to read back the matching results, check the match file code here; or you can use
"SfM->Pairwise Matching->Export F-Matrix Matches".
The [optional calibration] exists only if you use "Set Fixed Calibration" Function
FixedK fx cx fy cy
Each reconstructed <model> contains the following
<Number of cameras> <List of cameras>
<Number of 3D points> <List of points>
Check the LoadNVM function in util.h of Multicore bundle adjustment code for more details. The
LoadNVM function reads only the first model, and you should repeat to get all. Note the
whitespaces in <file name> are replaced by '\"'.
Visualization
* Use the menu items under Menu->View to choose a view mode or change parameters.
* Quicker switch can be done with mouse, keyboard shortcuts, or toolbar buttons.
* You can save the current view as a JPEG file or copy it into memory (paste to ppt?)
* More documentation is coming
Keyboard Shortcuts
'Z' zoom fit for 2D Views; reset for 3D views.
'backspace' return to thumbnail view
'TAB' switch between dense 3D model and sparse 3D model
'TAB' switch between original image and its undistorted view
'Pause' show only points seen by current camera
'T' Switch between 3D visualization mode: camera+point/camera/point
'S' switch SIFT display style in single image view mode
'F' show/hide features in single image view mode
'F' show/hide camera in Dense reconstruction mode
'A' show/hide coordinate axis in dense/sparse reconstruction
'left/right' switch to previous/next image/pair...
'up/down' switch to previous/next 3D model
...
Type "am" in the internal command line, and you will see the full list of commands for making
animations.
In case there are incorrectly registered cameras, you can delete them
* Select a camera by a right clicking on the camera in the 3D N-View Points Mode
* Delete the camera from the 3D model by clicking the (hand) button
You can also manually choose the initialization pair, see FAQ below.
You can also try disable the down-sampling and re-run experiments.
1, delete all the .sift and .mat files
2, choose "Tools -> Enable GPU -> Customized Param"
Add "-nomc" to the parameter list, which means "no memory cap"
3, re-run matching and reconstruction.
*, this does not guarantee to work because of the out-of-memory possibility.
*, this may work, and may become extremely slow because of using virtual memory.
Limitations of VisualSFM
1. The keypoint-based reconstruction only works with textured surfaces.
- This tool will not work if too few features are detected
- e.g. white wall, uniform color objects
3. There is only one radial parameter. It is sufficient for commodity photos, but
may not work if your camera has large distortions that do not fit the model.
4. The 32-bit versions may run out memory easily on large datasets due to OS limitation.
Multi-threading stability
During the reconstruction process, the points and cameras are being modified, while they are
also being visualized. Since I do not want the reconstruction get slowed down by the
visualization, very limited mutex control is used.
Some memory management has been done to ensure it is very unlikely to crash when you view the
live reconstruction (rotating, zooming, clicking, ...). But there is still a very very small
chance of memory leak.
This is not a problem for the console mode. Similarly, if you simply minimize your GUI window
or avoid too much mouse dragging, the chances of such memory leak will be reduced.
* Usually this happens if you do not have the CMVS binaries in the right place.
VisualSFM[.exe] expects in the same folder:
-- cmvs.exe/pmvs2.exe/genOption.exe [pthreadVC2.dll] for Windows,
-- cmvs/pmvs2/genOption for other OS.
See FAQ-2 if you haven't downloaded them yet.
* Note these commands may also fail due to bad input or their own bug. You should
try run these commands yourself from your system terminal and look at the
original log from these binaries.
3. CMVS/PMVS
Compile from the original source
Windows binaries by Pierre Moulon
cmvs.exe/pmvs2.exe/genOption.exe and pthreadVC2.dll
FAQ-3. Dense reconstruction tools other than PMVS?
VisualSFM natively supports running the CMVS/genOption/PMVS2 tool chain, but you need to run
Meshlab to convert the dense points to mesh.
CMP-MVS by Michal Jancosek: this tool produces meshes directly. You need to click VisualSFM's
'Reconstruct Dense' function and select 'NVM->CMP-MVS' to convert the data.
SURE by Mathias Rothermel and Konrad Wenzel: SURE natively support the NVM file format, and you
do not need to run the export function.
MVE by Michael Goesele's research group: this tool works with the same format of exported data
as CMVS/PMVS
MeshRecon by Zhuoliang Kang: this tool produces meshes directly and works with the NVM file (no
radial distortion, Windows only).
CPU replacement for feature detection (No needed for most cards, even Intel)
1. Obtain SIFT binary(Lowe's of VLFeat)
2. Put the SIFT binary in the same directory.
3. Choose Lowe's binary of VLFeat binary by setting param_use_vlfeat_or_lowe
4. Disable GPU feature detection by "Tools->Enable GPU->Disable SiftGPU"
or modify parameter param_use_siftgpu to 0 for permanent selection
Remote solution:
1. Run VisualSFM on a CUDA-capable Linux server through terminal (X-terminal for GUI).
2. You may copy the workspace back to local machine for visualization.
Using CPU for feature matching (No needed for most cards, even Intel)
Disable GPU feature matching by unchecking "Tools->Enable GPU->Match Using GLSL"
*. modify parameter param_use_siftmatchgpu to 0 for permanent selection
Using CPU for multicore bundle adjustment (GPU-based PBA does not supporting ATI)
Windows: Use the non-CUDA packages.
Linux: Download PBA code, run make with makefile_no_gpu, and rename libpba_no_gpu.so
to libpba.so, and copy it to the VisualSFM bin folder.
Option 1. Switch to "2-View 3D points", use left/right to scan through all image pairs, find a
pair with 3D points, choose "SfM->More Functions->Set Initialization Pair".
Option 2. If you know which two images are good, right click the two images in the "Thumbnail
View", verify it in "2-View 3D points", and choose "Set Initialization Pair".
Second, if feature detection and matching works correctly, there are two different cases
The common case: two models do not share enough images. The problem is caused by the lack of
feature matches between very different viewpoints. You should get more pictures to cover the
transiting viewpoints.
The rare case: two models share enough images (param_image_reuse_max). Try use "SfM->More
Functions->Merge Sparse Models" (V0.5.26+), and see if it merges the models. Such problems are
often caused by the choice of initialization. You may delete one model, and try resume the
reconstruction of the kept model.
However, I have already open-sourced SiftGPU (feature detection & matching) and PBA (multicore
bundle adjustment), which can be easily integrated into any other SfM system. You can also make
changes to them, and reintegrate with VisualSFM.
If you want some simple programmable control of the SfM system, you can either use VisualSFM
through command line, or use VisualSFM as a server through socket command interface (Python
wrapper by Nick Rhinehart).