DLProf#
DLProf (dlprof
) is the Deep Learning Profiler from Nvidia and can be used for
profiling deep learning scripts using Python with PyTorch or TensorFlow.
With DLProf Viewer (dlprofviewer
) a browser-based dashboard for visually
analyzing results from dlprof
exists.
For details see documentation for DLProf and Release Notes at Nvidia.
DLProf's PyTorch support is limited to version < v2.0
Installation#
Prerequisites
A Python module must be loaded, e.g. by executing
Installation steps
-
Optionally: load or create a conda environment or virtual environment where DLProf is installed into.
-
DLProf requires to install
nvidia-pyindex
first. This adds an Nvidia repository topip
. -
Install DLProf. DLProf can be installed for the framework you are targeting, e.g. PyTorch or TensorFlow. Choose one:
-
Install DLProf Viewer.
The installation steps as one code snippet:
pip install nvidia-pyindex
# for Pytorch:
pip install nvidia-dlprof[pytorch]
# for TensorFlow:
pip install nvidia-dlprof[tensorflow]
pip install nvidia-dlprofviewer
Usage#
Only profile for a short amount of time.
Profiling can create a lot of data and slow down training or inference. Hence, limit the profiling to a short amount of time. Run the profiler only for one epoch or a limited number of batches inside an epoch.
Depending on if you want to profile PyTorch or TensorFlow the required steps differ.
Profiling PyTorch scripts#
For Pytorch add the following code snippet to the script that should be profiled:
import nvidia_dlprof_pytorch_nvtx
nvidia_dlprof_pytorch_nvtx.init()
...
# put around your training/inference loop
# or the part you want to profile
with torch.autograd.profiler.emit_nvtx():
<training/inference loop>
This causes the corresponding modules to be imported and will profile anything
under with torch.autograd.profiler.emit_nvtx()
.
It is also possible to have the with torch.autograd.profiler.emit_nvtx():
statement inside the training/inference loop and only use it for a limited number of batches.
Run your script by prepending dlprof --mode=pytorch
:
Several *.sqlite
and *.qdrep
files will be created.
To show a textual report use:
Profiling TensorFlow (TF) scripts#
For TensorFlow scripts just prepend dlprof
when invoking your script:
# for TensorFlow 1.x
dlprof --mode=tensorflow1 <python tf script> [args to tf script]
# for TensorFlow 2.x
dlprof --mode=tensorflow2 <python tf script> [args to tf script]
Several *.sqlite
and *.qdrep
files will be created.
To show a textual report use:
Passing options to Nvidia Nsight Systems (nsys
)#
DLProf uses Nvidia Nsight Systems (nsys
) for profiling.
You can add options to nsys
via --nsys_opts
flags:
Troubleshooting#
Overwrite existing files#
DLProf does not override existing tracing files by default.
Specify the flag --force=true
to do so:
Nsight Systems (nsys
) errors after profiling#
Errors of the following form from Nsight Systems
can indicate that Nsight Systems is incompatible with the used CUDA version.
DLProf installs its own version of Nsight Systems via the nvidia-nsys-cli
package which
is unrelated to Nsight Systems which comes from a possible different CUDA Toolkit version your application uses.
As a workaround a newer Nsight Systems version is required. Two options are available:
- Load a newer CUDA module.
- Even if your application does not use this CUDA module DLProf will use Nsight Systems this module provides.
- Manually install a newer
nvidia-nsys-cli
package, i.e.nvidia_nsys_cli ... .whl
from https://developer.download.nvidia.com/devtools/nsight-systems/ and install it viapip
.
Errors from Nsight Systems, indicating an incompatibility with the used CUDA version:
Error {
Type: RuntimeError
SubError {
Type: ProcessEventsError
Props {
Items {
Type: ErrorText
Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/Modules/StringStorage.cpp(149): Throw in function QuadDCommon::StringId QuadDAnalysis::StringStorage::GetKeyForExterior
Id(QuadDAnalysis::GlobalProcess, QuadDAnalysis::StringStorage::ExteriorId) const\nDynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::LogicException>\nstd::exception::wh
at: LogicException\n[QuadDCommon::tag_message*] = Cannot find bucket for a bucket index\n"
}
}
}
}
Status: TargetProfilingFailed
Props {
Items {
Type: DeviceId
Value: "Local (CLI)"
}
}
Error {
Type: RuntimeError
SubError {
Type: ProcessEventsError
Props {
Items {
Type: ErrorText
Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, con
st QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDC
ommon::tag_message*] = Unknown runtime API function index: 430\n"
}
}
}
}
Viewing profile results DLProf Viewer#
Warning
Running dlprofviewer
opens a dashboard under http://127.0.0.1:8000. This
is accessible to all users who currently have jobs scheduled on the same
compute node.
Run dlprofviewer
:
This starts a dashboard as web-application that you can access with your browser under http://127.0.0.1:8000.
For more details see the Nvidia's official documentation.
Viewing remote dashboards#
In case you are running dlprofviewer
on a front end or cluster node and
want to view the dashboard with your local browser, you have to create a port
forwarding via ssh
.
The port forwarding will tunnel a connection to the dashboard from your local
machine to the corresponding cluster node.
Prerequisites
If you are running dlbprofview
on a cluster node, make sure you have
configured the Template for connecting to cluster nodes
in your local .ssh_config
and are able to connect to a cluster node from your
local machine.
Setup
For setting up the port forwarding the steps are:
-
On the front end or cluster node obtain the fully qualified domain name (FQDN), from further on called remote-fqdn:
-
On the front end or cluster node, if not already started, start
dlprofviewer
: -
On your local machine create a port forwarding to the remote machine:
This opens a new SSH session to the remote machine and also forwards the local port 8000 to the remote machine. If you exit the terminal, the port forwarding will also be stopped, when the last connection terminates.
-
Open a browser on your local machine and navigate to
http://localhost:8000
. You should now see the DLProf Viewer dashboard.